HumanEval+
Official shard evalHumanEval+ randomized eval shards from evalplus/humanevalplus:test. Question text lives in S3; Postgres stores only shard metadata and results.
Category: CodingEval type: Shard evalQuestions: 164Shards: 13Runs: 35
Dry-run first:
lmx eval shard humaneval-plus --base-url http://localhost:8000 --questions 116 --dry-run. Then submit with a real model and hardware profile: lmx eval shard humaneval-plus --base-url http://localhost:8000 --questions 116 --model <hfId> --hardware hardware.json --submit. Scores are pooled by unique question_id; the leaderboard is ranked by Wilson 95% lower bound.Leaderboard
nvidia/Gemma-4-31B-IT-NVFP4 · 32 runs · 5/13 shards · harness-scoped
92.7%
95% CI 87.6–95.8%
152/164 correct · 100.0% coverage
2
gemma-4-26B-A4B-it-qat-GGUFQ4_K_XL · gguf
unsloth/gemma-4-26B-A4B-it-qat-GGUF · 1 run · 1/13 shards · harness-scoped
93.0%
95% CI 86.3–96.6%
93/100 correct · 61.0% coverage
3
Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF · 1 run · 1/13 shards · harness-scoped
90.9%
95% CI 85.5–94.4%
149/164 correct · 100.0% coverage
4
gemma-4-12b-it-GGUFQ8_K_XL · gguf
unsloth/gemma-4-12b-it-GGUF · 1 run · 1/13 shards · harness-scoped
90.0%
95% CI 82.6–94.5%
90/100 correct · 61.0% coverage
Stability— historical rerun transparency
Leaderboard rank uses the canonical latest approved answer per question_id. These metrics include historical submissions too, so reruns and changed answers are visible but do not drive rank.
32 historical runs · 164 unique questions
Canonical
92.7%
Row avg
93.0%
Run avg
93.0%
Repeated
3,548
Changed
7
gemma-4-26B-A4B-it-qat-GGUFQ4_K_XL · gguf
0 historical runs · 100 unique questions
Canonical
93.0%
Row avg
—
Run avg
—
Repeated
0
Changed
0
1 historical run · 164 unique questions
Canonical
90.9%
Row avg
90.9%
Run avg
90.9%
Repeated
0
Changed
0
gemma-4-12b-it-GGUFQ8_K_XL · gguf
0 historical runs · 100 unique questions
Canonical
90.0%
Row avg
—
Run avg
—
Repeated
0
Changed
0
Runs— sample traces per run
by Lottolabs · shard 9 · 6/29/2026, 6:09:35 AM · cmqytijwd01c0oe010qufz5ca90.9%149/164 correct · 5 correct traces · 5 incorrect traces
by Lottolabs · shard 9 · 6/29/2026, 6:09:35 AM · cmqytijwd01c0oe010qufz5ca
90.9%
Correct samples
sample 1 · humaneval-plus:HumanEval/125pass · 100.0% · 1728ms · 17cf15e8f2e5
Question
def split_words(txt):
'''
Given a string of words, return a list of words split on whitespace, if no whitespaces exists in the text you
should split on commas ',' if no commas exists you should return the number of lower-case letters with odd order in the
alphabet, ord('a') = 0, ord('b') = 1, ... ord('z') = 25
Examples
split_words("Hello world!") ➞ ["Hello", "world!"]
split_words("Hello,world!") ➞ ["Hello", "world!"]
split_words("abcdef") == 3
'''Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def split_words(txt):
'''
Given a string of words, return a list of words split on whitespace, if no whitespaces exists in the text you
should split on commas ',' if no commas exists you should return the number of lower-case letters with odd order in the
alphabet, ord('a') = 0, ord('b') = 1, ... ord('z') = 25
Examples
split_words("Hello world!") ➞ ["Hello", "world!"]
split_words("Hello,world!") ➞ ["Hello", "world!"]
split_words("abcdef") == 3
'''
Model answer
def split_words(txt):
if ' ' in txt:
return txt.split()
elif ',' in txt:
return txt.split(',')
else:
count = 0
for char in txt:
if char.islower():
order = ord(char) - ord('a')
if order % 2 == 1:
count += 1
return countReasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passsample 2 · humaneval-plus:HumanEval/86pass · 100.0% · 2800ms · 7f48094ca79c
Question
def anti_shuffle(s):
"""
Write a function that takes a string and returns an ordered version of it.
Ordered version of string, is a string where all words (separated by space)
are replaced by a new word where all the characters arranged in
ascending order based on ascii value.
Note: You should keep the order of words and blank spaces in the sentence.
For example:
anti_shuffle('Hi') returns 'Hi'
anti_shuffle('hello') returns 'ehllo'
anti_shuffle('Hello World!!!') returns 'Hello !!!Wdlor'
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def anti_shuffle(s):
"""
Write a function that takes a string and returns an ordered version of it.
Ordered version of string, is a string where all words (separated by space)
are replaced by a new word where all the characters arranged in
ascending order based on ascii value.
Note: You should keep the order of words and blank spaces in the sentence.
For example:
anti_shuffle('Hi') returns 'Hi'
anti_shuffle('hello') returns 'ehllo'
anti_shuffle('Hello World!!!') returns 'Hello !!!Wdlor'
"""
Model answer
def anti_shuffle(s):
"""
Write a function that takes a string and returns an ordered version of it.
Ordered version of string, is a string where all words (separated by space)
are replaced by a new word where all the characters arranged in
ascending order based on ascii value.
Note: You should keep the order of words and blank spaces in the sentence.
For example:
anti_shuffle('Hi') returns 'Hi'
anti_shuffle('hello') returns 'ehllo'
anti_shuffle('Hello World!!!') returns 'Hello !!!Wdlor'
"""
words = s.split(' ')
result = []
for word in words:
sorted_word = ''.join(sorted(word))
result.append(sorted_word)
return ' '.join(result)Reasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passsample 3 · humaneval-plus:HumanEval/140pass · 100.0% · 3600ms · 7bec51cd74df
Question
def fix_spaces(text):
"""
Given a string text, replace all spaces in it with underscores,
and if a string has more than 2 consecutive spaces,
then replace all consecutive spaces with -
fix_spaces("Example") == "Example"
fix_spaces("Example 1") == "Example_1"
fix_spaces(" Example 2") == "_Example_2"
fix_spaces(" Example 3") == "_Example-3"
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def fix_spaces(text):
"""
Given a string text, replace all spaces in it with underscores,
and if a string has more than 2 consecutive spaces,
then replace all consecutive spaces with -
fix_spaces("Example") == "Example"
fix_spaces("Example 1") == "Example_1"
fix_spaces(" Example 2") == "_Example_2"
fix_spaces(" Example 3") == "_Example-3"
"""
Model answer
def fix_spaces(text):
"""
Given a string text, replace all spaces in it with underscores,
and if a string has more than 2 consecutive spaces,
then replace all consecutive spaces with -
fix_spaces("Example") == "Example"
fix_spaces("Example 1") == "Example_1"
fix_spaces(" Example 2") == "_Example_2"
fix_spaces(" Example 3") == "_Example-3"
"""
result = []
i = 0
while i < len(text):
if text[i] == ' ':
# Count consecutive spaces
j = i
while j < len(text) and text[j] == ' ':
j += 1
space_count = j - i
if space_count > 2:
result.append('-')
else:
result.append('_' * space_count)
i = j
else:
result.append(text[i])
i += 1
return ''.join(result)Reasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passsample 4 · humaneval-plus:HumanEval/89pass · 100.0% · 3143ms · faecea1d9457
Question
def encrypt(s):
"""Create a function encrypt that takes a string as an argument and
returns a string encrypted with the alphabet being rotated.
The alphabet should be rotated in a manner such that the letters
shift down by two multiplied to two places.
For example:
encrypt('hi') returns 'lm'
encrypt('asdfghjkl') returns 'ewhjklnop'
encrypt('gf') returns 'kj'
encrypt('et') returns 'ix'
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def encrypt(s):
"""Create a function encrypt that takes a string as an argument and
returns a string encrypted with the alphabet being rotated.
The alphabet should be rotated in a manner such that the letters
shift down by two multiplied to two places.
For example:
encrypt('hi') returns 'lm'
encrypt('asdfghjkl') returns 'ewhjklnop'
encrypt('gf') returns 'kj'
encrypt('et') returns 'ix'
"""
Model answer
def encrypt(s):
"""Create a function encrypt that takes a string as an argument and
returns a string encrypted with the alphabet being rotated.
The alphabet should be rotated in a manner such that the letters
shift down by two multiplied to two places.
For example:
encrypt('hi') returns 'lm'
encrypt('asdfghjkl') returns 'ewhjklnop'
encrypt('gf') returns 'kj'
encrypt('et') returns 'ix'
"""
result = []
for char in s:
if char.isalpha():
if char.islower():
base = ord('a')
else:
base = ord('A')
shifted = (ord(char) - base + 4) % 26 + base
result.append(chr(shifted))
else:
result.append(char)
return ''.join(result)Reasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passsample 5 · humaneval-plus:HumanEval/80pass · 100.0% · 1423ms · 8c46a24a3a54
Question
def is_happy(s):
"""You are given a string s.
Your task is to check if the string is happy or not.
A string is happy if its length is at least 3 and every 3 consecutive letters are distinct
For example:
is_happy(a) => False
is_happy(aa) => False
is_happy(abcd) => True
is_happy(aabb) => False
is_happy(adb) => True
is_happy(xyy) => False
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def is_happy(s):
"""You are given a string s.
Your task is to check if the string is happy or not.
A string is happy if its length is at least 3 and every 3 consecutive letters are distinct
For example:
is_happy(a) => False
is_happy(aa) => False
is_happy(abcd) => True
is_happy(aabb) => False
is_happy(adb) => True
is_happy(xyy) => False
"""
Model answer
def is_happy(s):
if len(s) < 3:
return False
for i in range(len(s) - 2):
if s[i] == s[i+1] or s[i+1] == s[i+2] or s[i] == s[i+2]:
return False
return TrueReasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passIncorrect samples
sample 16 · humaneval-plus:HumanEval/99fail · 0.0% · 1145ms · 125aa68cf8ce
Question
def closest_integer(value):
'''
Create a function that takes a value (string) representing a number
and returns the closest integer to it. If the number is equidistant
from two integers, round it away from zero.
Examples
>>> closest_integer("10")
10
>>> closest_integer("15.3")
15
Note:
Rounding away from zero means that if the given number is equidistant
from two integers, the one you should return is the one that is the
farthest from zero. For example closest_integer("14.5") should
return 15 and closest_integer("-14.5") should return -15.
'''Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def closest_integer(value):
'''
Create a function that takes a value (string) representing a number
and returns the closest integer to it. If the number is equidistant
from two integers, round it away from zero.
Examples
>>> closest_integer("10")
10
>>> closest_integer("15.3")
15
Note:
Rounding away from zero means that if the given number is equidistant
from two integers, the one you should return is the one that is the
farthest from zero. For example closest_integer("14.5") should
return 15 and closest_integer("-14.5") should return -15.
'''
Model answer
def closest_integer(value):
num = float(value)
if num >= 0:
return int(num + 0.5)
else:
return -int(-num + 0.5)Reasoning trace
sandbox: passed=false returncode=1 timed_out=false
File "/tmp/tmp44ie7p01/solution.py", line 1
from two integers, round it away from zero.
^^^^^^^^
SyntaxError: invalid syntaxExtracted:
fail · Gold: passsample 23 · humaneval-plus:HumanEval/145fail · 0.0% · 2744ms · d5a81ec8d1f0
Question
def order_by_points(nums):
"""
Write a function which sorts the given list of integers
in ascending order according to the sum of their digits.
Note: if there are several items with similar sum of their digits,
order them based on their index in original list.
For example:
>>> order_by_points([1, 11, -1, -11, -12]) == [-1, -11, 1, -12, 11]
>>> order_by_points([]) == []
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def order_by_points(nums):
"""
Write a function which sorts the given list of integers
in ascending order according to the sum of their digits.
Note: if there are several items with similar sum of their digits,
order them based on their index in original list.
For example:
>>> order_by_points([1, 11, -1, -11, -12]) == [-1, -11, 1, -12, 11]
>>> order_by_points([]) == []
"""
Model answer
def order_by_points(nums):
"""
Write a function which sorts the given list of integers
in ascending order according to the sum of their digits.
Note: if there are several items with similar sum of their digits,
order them based on their index in original list.
For example:
>>> order_by_points([1, 11, -1, -11, -12]) == [-1, -11, 1, -12, 11]
>>> order_by_points([]) == []
"""
def digit_sum(n):
return sum(int(d) for d in str(abs(n)))
return sorted(nums, key=lambda x: (digit_sum(x), nums.index(x)))Reasoning trace
sandbox: passed=false returncode=1 timed_out=false
Traceback (most recent call last):
File "/tmp/tmpyq9nrmax/solution.py", line 50, in <module>
check(order_by_points)
File "/tmp/tmpyq9nrmax/solution.py", line 47, in check
assertion(candidate(*inp), exp, 0)
File "/tmp/tmpyq9nrmax/solution.py", line 40, in assertion
assert exact_match
^^^^^^^^^^^
AssertionErrorExtracted:
fail · Gold: passsample 27 · humaneval-plus:HumanEval/151fail · 0.0% · 1304ms · f97a9b534c92
Question
def double_the_difference(lst):
'''
Given a list of numbers, return the sum of squares of the numbers
in the list that are odd. Ignore numbers that are negative or not integers.
double_the_difference([1, 3, 2, 0]) == 1 + 9 + 0 + 0 = 10
double_the_difference([-1, -2, 0]) == 0
double_the_difference([9, -2]) == 81
double_the_difference([0]) == 0
If the input list is empty, return 0.
'''Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def double_the_difference(lst):
'''
Given a list of numbers, return the sum of squares of the numbers
in the list that are odd. Ignore numbers that are negative or not integers.
double_the_difference([1, 3, 2, 0]) == 1 + 9 + 0 + 0 = 10
double_the_difference([-1, -2, 0]) == 0
double_the_difference([9, -2]) == 81
double_the_difference([0]) == 0
If the input list is empty, return 0.
'''
Model answer
def double_the_difference(lst):
total = 0
for num in lst:
if isinstance(num, int) and not isinstance(num, bool) and num >= 0 and num % 2 != 0:
total += num * num
return totalReasoning trace
sandbox: passed=false returncode=1 timed_out=false
Traceback (most recent call last):
File "/tmp/tmptfnhfqjb/solution.py", line 41, in <module>
check(double_the_difference)
File "/tmp/tmptfnhfqjb/solution.py", line 38, in check
assertion(candidate(*inp), exp, 0)
File "/tmp/tmptfnhfqjb/solution.py", line 31, in assertion
assert exact_match
^^^^^^^^^^^
AssertionErrorExtracted:
fail · Gold: passsample 28 · humaneval-plus:HumanEval/76fail · 0.0% · 8140ms · baea024f4bf6
Question
def is_simple_power(x, n):
"""Your task is to write a function that returns true if a number x is a simple
power of n and false in other cases.
x is a simple power of n if n**int=x
For example:
is_simple_power(1, 4) => true
is_simple_power(2, 2) => true
is_simple_power(8, 2) => true
is_simple_power(3, 2) => false
is_simple_power(3, 1) => false
is_simple_power(5, 3) => false
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def is_simple_power(x, n):
"""Your task is to write a function that returns true if a number x is a simple
power of n and false in other cases.
x is a simple power of n if n**int=x
For example:
is_simple_power(1, 4) => true
is_simple_power(2, 2) => true
is_simple_power(8, 2) => true
is_simple_power(3, 2) => false
is_simple_power(3, 1) => false
is_simple_power(5, 3) => false
"""
Model answer
def is_simple_power(x, n):
"""Your task is to write a function that returns true if a number x is a simple
power of n and false in other cases.
x is a simple power of n if n**int=x
For example:
is_simple_power(1, 4) => true
is_simple_power(2, 2) => true
is_simple_power(8, 2) => true
is_simple_power(3, 2) => false
is_simple_power(3, 1) => false
is_simple_power(5, 3) => false
"""
if n == 0:
return x == 0
if n == 1:
return x == 1
if n == -1:
return x == 1 or x == -1
if x == 0:
return False
if x == 1:
return True
if x == -1:
return n == -1
# Handle negative x
if x < 0:
if n > 0:
return False
# n is negative, x is negative
# n**k = x < 0 means k must be odd
# Let's check if |x| is a power of |n| with an odd exponent
abs_x = -x
abs_n = -n
k = 0
while abs_x > 1:
if abs_x % abs_n != 0:
return False
abs_x //= abs_n
k += 1
return k % 2 == 1
# x > 0, n != 0, n != 1, n != -1
if n < 0:
# n is negative, x is positive
# n**k = x > 0 means k must be even
abs_n = -n
k = 0
temp = x
while temp > 1:
if temp % abs_n != 0:
return False
temp //= abs_n
k += 1
return k % 2 == 0
# n > 0, x > 0
k = 0
temp = x
while temp > 1:
if temp % n != 0:
return False
temp //= n
k += 1
return TrueReasoning trace
sandbox: passed=false returncode=1 timed_out=false
Traceback (most recent call last):
File "/tmp/tmphht9heiw/solution.py", line 100, in <module>
check(is_simple_power)
File "/tmp/tmphht9heiw/solution.py", line 97, in check
assertion(candidate(*inp), exp, 0)
File "/tmp/tmphht9heiw/solution.py", line 90, in assertion
assert exact_match
^^^^^^^^^^^
AssertionErrorExtracted:
fail · Gold: passsample 33 · humaneval-plus:HumanEval/93fail · 0.0% · 3806ms · b63a4c22d0c7
Question
def encode(message):
"""
Write a function that takes a message, and encodes in such a
way that it swaps case of all letters, replaces all vowels in
the message with the letter that appears 2 places ahead of that
vowel in the english alphabet.
Assume only letters.
Examples:
>>> encode('test')
'TGST'
>>> encode('This is a message')
'tHKS KS C MGSSCGG'
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def encode(message):
"""
Write a function that takes a message, and encodes in such a
way that it swaps case of all letters, replaces all vowels in
the message with the letter that appears 2 places ahead of that
vowel in the english alphabet.
Assume only letters.
Examples:
>>> encode('test')
'TGST'
>>> encode('This is a message')
'tHKS KS C MGSSCGG'
"""
Model answer
def encode(message):
"""
Write a function that takes a message, and encodes in such a
way that it swaps case of all letters, replaces all vowels in
the message with the letter that appears 2 places ahead of that
vowel in the english alphabet.
Assume only letters.
Examples:
>>> encode('test')
'TGST'
>>> encode('This is a message')
'tHKS KS C MGSSCGG'
"""
vowels = 'aeiouAEIOU'
vowel_map = {
'a': 'c', 'e': 'g', 'i': 'k', 'o': 'q', 'u': 'w',
'A': 'C', 'E': 'G', 'I': 'K', 'O': 'Q', 'U': 'W'
}
result = []
for char in message:
if char in vowels:
result.append(vowel_map[char])
else:
result.append(char.swapcase())
return ''.join(result)Reasoning trace
sandbox: passed=false returncode=1 timed_out=false
Traceback (most recent call last):
File "/tmp/tmp5fudm99p/solution.py", line 61, in <module>
check(encode)
File "/tmp/tmp5fudm99p/solution.py", line 58, in check
assertion(candidate(*inp), exp, 0)
File "/tmp/tmp5fudm99p/solution.py", line 51, in assertion
assert exact_match
^^^^^^^^^^^
AssertionErrorExtracted:
fail · Gold: passby Stewart_of_Mars · shard 8 · 6/29/2026, 3:42:49 AM · cmqyo9t2t08tmpk01zhuwhaa893.1%108/116 correct · 5 correct traces · 5 incorrect traces
by Stewart_of_Mars · shard 8 · 6/29/2026, 3:42:49 AM · cmqyo9t2t08tmpk01zhuwhaa8
93.1%
Correct samples
sample 1 · humaneval-plus:HumanEval/15pass · 100.0% · 12297ms · a395ec9832f4
Question
def string_sequence(n: int) -> str:
""" Return a string containing space-delimited numbers starting from 0 upto n inclusive.
>>> string_sequence(0)
'0'
>>> string_sequence(5)
'0 1 2 3 4 5'
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def string_sequence(n: int) -> str:
""" Return a string containing space-delimited numbers starting from 0 upto n inclusive.
>>> string_sequence(0)
'0'
>>> string_sequence(5)
'0 1 2 3 4 5'
"""
Model answer
def string_sequence(n: int) -> str:
""" Return a string containing space-delimited numbers starting from 0 upto n inclusive.
>>> string_sequence(0)
'0'
>>> string_sequence(5)
'0 1 2 3 4 5'
"""
return ' '.join(map(str, range(n + 1)))Reasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passsample 2 · humaneval-plus:HumanEval/42pass · 100.0% · 6760ms · 391ef2e87f96
Question
def incr_list(l: list):
"""Return list with elements incremented by 1.
>>> incr_list([1, 2, 3])
[2, 3, 4]
>>> incr_list([5, 3, 5, 2, 3, 3, 9, 0, 123])
[6, 4, 6, 3, 4, 4, 10, 1, 124]
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def incr_list(l: list):
"""Return list with elements incremented by 1.
>>> incr_list([1, 2, 3])
[2, 3, 4]
>>> incr_list([5, 3, 5, 2, 3, 3, 9, 0, 123])
[6, 4, 6, 3, 4, 4, 10, 1, 124]
"""
Model answer
def incr_list(l: list):
"""Return list with elements incremented by 1.
>>> incr_list([1, 2, 3])
[2, 3, 4]
>>> incr_list([5, 3, 5, 2, 3, 3, 9, 0, 123])
[6, 4, 6, 3, 4, 4, 10, 1, 124]
"""
return [x + 1 for x in l]Reasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passsample 3 · humaneval-plus:HumanEval/126pass · 100.0% · 18116ms · 17f6d5efd852
Question
def is_sorted(lst):
'''
Given a list of numbers, return whether or not they are sorted
in ascending order. If list has more than 1 duplicate of the same
number, return False. Assume no negative numbers and only integers.
Examples
is_sorted([5]) ➞ True
is_sorted([1, 2, 3, 4, 5]) ➞ True
is_sorted([1, 3, 2, 4, 5]) ➞ False
is_sorted([1, 2, 3, 4, 5, 6]) ➞ True
is_sorted([1, 2, 3, 4, 5, 6, 7]) ➞ True
is_sorted([1, 3, 2, 4, 5, 6, 7]) ➞ False
is_sorted([1, 2, 2, 3, 3, 4]) ➞ True
is_sorted([1, 2, 2, 2, 3, 4]) ➞ False
'''Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def is_sorted(lst):
'''
Given a list of numbers, return whether or not they are sorted
in ascending order. If list has more than 1 duplicate of the same
number, return False. Assume no negative numbers and only integers.
Examples
is_sorted([5]) ➞ True
is_sorted([1, 2, 3, 4, 5]) ➞ True
is_sorted([1, 3, 2, 4, 5]) ➞ False
is_sorted([1, 2, 3, 4, 5, 6]) ➞ True
is_sorted([1, 2, 3, 4, 5, 6, 7]) ➞ True
is_sorted([1, 3, 2, 4, 5, 6, 7]) ➞ False
is_sorted([1, 2, 2, 3, 3, 4]) ➞ True
is_sorted([1, 2, 2, 2, 3, 4]) ➞ False
'''
Model answer
def is_sorted(lst):
'''
Given a list of numbers, return whether or not they are sorted
in ascending order. If list has more than 1 duplicate of the same
number, return False. Assume no negative numbers and only integers.
Examples
is_sorted([5]) ➞ True
is_sorted([1, 2, 3, 4, 5]) ➞ True
is_sorted([1, 3, 2, 4, 5]) ➞ False
is_sorted([1, 2, 3, 4, 5, 6]) ➞ True
is_sorted([1, 2, 3, 4, 5, 6, 7]) ➞ True
is_sorted([1, 3, 2, 4, 5, 6, 7]) ➞ False
is_sorted([1, 2, 2, 3, 3, 4]) ➞ True
is_sorted([1, 2, 2, 2, 3, 4]) ➞ False
'''
# Check if sorted in ascending order
for i in range(len(lst) - 1):
if lst[i] > lst[i + 1]:
return False
# Check if any number appears more than twice
counts = {}
for num in lst:
counts[num] = counts.get(num, 0) + 1
if counts[num] > 2:
return False
return TrueReasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passsample 4 · humaneval-plus:HumanEval/61pass · 100.0% · 7336ms · a41d7abf5673
Question
def correct_bracketing(brackets: str):
""" brackets is a string of "(" and ")".
return True if every opening bracket has a corresponding closing bracket.
>>> correct_bracketing("(")
False
>>> correct_bracketing("()")
True
>>> correct_bracketing("(()())")
True
>>> correct_bracketing(")(()")
False
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def correct_bracketing(brackets: str):
""" brackets is a string of "(" and ")".
return True if every opening bracket has a corresponding closing bracket.
>>> correct_bracketing("(")
False
>>> correct_bracketing("()")
True
>>> correct_bracketing("(()())")
True
>>> correct_bracketing(")(()")
False
"""
Model answer
def correct_bracketing(brackets: str):
""" brackets is a string of "(" and ")".
return True if every opening bracket has a corresponding closing bracket.
>>> correct_bracketing("(")
False
>>> correct_bracketing("()")
True
>>> correct_bracketing("(()())")
True
>>> correct_bracketing(")(()")
False
"""
balance = 0
for char in brackets:
if char == '(':
balance += 1
elif char == ')':
balance -= 1
if balance < 0:
return False
return balance == 0Reasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passsample 5 · humaneval-plus:HumanEval/161pass · 100.0% · 7588ms · 6906a59e80a9
Question
def solve(s):
"""You are given a string s.
if s[i] is a letter, reverse its case from lower to upper or vise versa,
otherwise keep it as it is.
If the string contains no letters, reverse the string.
The function should return the resulted string.
Examples
solve("1234") = "4321"
solve("ab") = "AB"
solve("#a@C") = "#A@c"
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def solve(s):
"""You are given a string s.
if s[i] is a letter, reverse its case from lower to upper or vise versa,
otherwise keep it as it is.
If the string contains no letters, reverse the string.
The function should return the resulted string.
Examples
solve("1234") = "4321"
solve("ab") = "AB"
solve("#a@C") = "#A@c"
"""
Model answer
def solve(s):
"""You are given a string s.
if s[i] is a letter, reverse its case from lower to upper or vise versa,
otherwise keep it as it is.
If the string contains no letters, reverse the string.
The function should return the resulted string.
Examples
solve("1234") = "4321"
solve("ab") = "AB"
solve("#a@C") = "#A@c"
"""
if not any(c.isalpha() for c in s):
return s[::-1]
return s.swapcase()Reasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passIncorrect samples
sample 24 · humaneval-plus:HumanEval/116fail · 0.0% · 16666ms · e424308979c4
Question
def sort_array(arr):
"""
In this Kata, you have to sort an array of non-negative integers according to
number of ones in their binary representation in ascending order.
For similar number of ones, sort based on decimal value.
It must be implemented like this:
>>> sort_array([1, 5, 2, 3, 4]) == [1, 2, 3, 4, 5]
>>> sort_array([-2, -3, -4, -5, -6]) == [-6, -5, -4, -3, -2]
>>> sort_array([1, 0, 2, 3, 4]) [0, 1, 2, 3, 4]
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def sort_array(arr):
"""
In this Kata, you have to sort an array of non-negative integers according to
number of ones in their binary representation in ascending order.
For similar number of ones, sort based on decimal value.
It must be implemented like this:
>>> sort_array([1, 5, 2, 3, 4]) == [1, 2, 3, 4, 5]
>>> sort_array([-2, -3, -4, -5, -6]) == [-6, -5, -4, -3, -2]
>>> sort_array([1, 0, 2, 3, 4]) [0, 1, 2, 3, 4]
"""
Model answer
def sort_array(arr):
"""
In this Kata, you have to sort an array of non-negative integers according to
number of ones in their binary representation in ascending order.
For similar number of ones, sort based on decimal value.
It must be implemented like this:
>>> sort_array([1, 5, 2, 3, 4]) == [1, 2, 3, 4, 5]
>>> sort_array([-2, -3, -4, -5, -6]) == [-6, -5, -4, -3, -2]
>>> sort_array([1, 0, 2, 3, 4]) [0, 1, 2, 3, 4]
"""
# bin(x).count('1') counts the set bits for non-negative integers.
# For negative integers, the problem description mentions "non-negative",
# but the examples include negatives. In Python, bin(-1) is '-0b1',
# so we use the absolute value or a bitmask if specific 2's complement
# behavior is needed. Based on the provided examples, sorting by
# (bit_count, value) handles the requirements.
return sorted(arr, key=lambda x: (bin(x).count('1') if x >= 0 else bin(x & 0xFFFFFFFF).count('1'), x))Reasoning trace
sandbox: passed=false returncode=1 timed_out=false
Traceback (most recent call last):
File "/tmp/tmpygivxh36/solution.py", line 53, in <module>
check(sort_array)
File "/tmp/tmpygivxh36/solution.py", line 50, in check
assertion(candidate(*inp), exp, 0)
File "/tmp/tmpygivxh36/solution.py", line 43, in assertion
assert exact_match
^^^^^^^^^^^
AssertionErrorExtracted:
fail · Gold: passsample 25 · humaneval-plus:HumanEval/129fail · 0.0% · 22831ms · b1ae63818edc
Question
def minPath(grid, k):
"""
Given a grid with N rows and N columns (N >= 2) and a positive integer k,
each cell of the grid contains a value. Every integer in the range [1, N * N]
inclusive appears exactly once on the cells of the grid.
You have to find the minimum path of length k in the grid. You can start
from any cell, and in each step you can move to any of the neighbor cells,
in other words, you can go to cells which share an edge with you current
cell.
Please note that a path of length k means visiting exactly k cells (not
necessarily distinct).
You CANNOT go off the grid.
A path A (of length k) is considered less than a path B (of length k) if
after making the ordered lists of the values on the cells that A and B go
through (let's call them lst_A and lst_B), lst_A is lexicographically less
than lst_B, in other words, there exist an integer index i (1 <= i <= k)
such that lst_A[i] < lst_B[i] and for any j (1 <= j < i) we have
lst_A[j] = lst_B[j].
It is guaranteed that the answer is unique.
Return an ordered list of the values on the cells that the minimum path go through.
Examples:
Input: grid = [ [1,2,3], [4,5,6], [7,8,9]], k = 3
Output: [1, 2, 1]
Input: grid = [ [5,9,3], [4,1,6], [7,8,2]], k = 1
Output: [1]
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def minPath(grid, k):
"""
Given a grid with N rows and N columns (N >= 2) and a positive integer k,
each cell of the grid contains a value. Every integer in the range [1, N * N]
inclusive appears exactly once on the cells of the grid.
You have to find the minimum path of length k in the grid. You can start
from any cell, and in each step you can move to any of the neighbor cells,
in other words, you can go to cells which share an edge with you current
cell.
Please note that a path of length k means visiting exactly k cells (not
necessarily distinct).
You CANNOT go off the grid.
A path A (of length k) is considered less than a path B (of length k) if
after making the ordered lists of the values on the cells that A and B go
through (let's call them lst_A and lst_B), lst_A is lexicographically less
than lst_B, in other words, there exist an integer index i (1 <= i <= k)
such that lst_A[i] < lst_B[i] and for any j (1 <= j < i) we have
lst_A[j] = lst_B[j].
It is guaranteed that the answer is unique.
Return an ordered list of the values on the cells that the minimum path go through.
Examples:
Input: grid = [ [1,2,3], [4,5,6], [7,8,9]], k = 3
Output: [1, 2, 1]
Input: grid = [ [5,9,3], [4,1,6], [7,8,2]], k = 1
Output: [1]
"""
Model answer
def minPath(grid, k):
"""
Given a grid with N rows and N columns (N >= 2) and a positive integer k,
each cell of the grid contains a value. Every integer in the range [1, N * N]
inclusive appears exactly once on the cells of the grid.
"""
n = len(grid)
# Find the position of the minimum value in the grid
min_val = float('inf')
start_pos = (0, 0)
for r in range(n):
for c in range(n):
if grid[r][c] < min_val:
min_val = grid[r][c]
start_pos = (r, c)
# If k == 1, the path is just the minimum element
if k == 1:
return [min_val]
# To find the lexicographically smallest path of length k,
# we want to start at the smallest possible value.
# Then we want the second value to be the smallest possible neighbor.
# Once we reach the smallest possible neighbor, the best strategy
# to keep the path lexicographically smallest is to oscillate
# between the smallest value and its smallest neighbor.
r, c = start_pos
min_neighbor = float('inf')
for dr, dc in [(0, 1), (0, -1), (1, 0), (-1, 0)]:
nr, nc = r + dr, c + dc
if 0 <= nr < n and 0 <= nc < n:
if grid[nr][nc] < min_neighbor:
min_neighbor = grid[nr][nc]
# The path will be [min_val, min_neighbor, min_val, min_neighbor, ...]
path = []
for i in range(k):
if i % 2 == 0:
path.append(min_val)
else:
path.append(min_neighbor)
return pathReasoning trace
sandbox: passed=false returncode=1 timed_out=false
File "/tmp/tmp0orlm276/solution.py", line 1
from any cell, and in each step you can move to any of the neighbor cells,
^^^^
SyntaxError: invalid syntaxExtracted:
fail · Gold: passsample 26 · humaneval-plus:HumanEval/39fail · 0.0% · 31009ms · c5befd80bb93
Question
def prime_fib(n: int):
"""
prime_fib returns n-th number that is a Fibonacci number and it's also prime.
>>> prime_fib(1)
2
>>> prime_fib(2)
3
>>> prime_fib(3)
5
>>> prime_fib(4)
13
>>> prime_fib(5)
89
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def prime_fib(n: int):
"""
prime_fib returns n-th number that is a Fibonacci number and it's also prime.
>>> prime_fib(1)
2
>>> prime_fib(2)
3
>>> prime_fib(3)
5
>>> prime_fib(4)
13
>>> prime_fib(5)
89
"""
Model answer
def prime_fib(n: int):
"""
prime_fib returns n-th number that is a Fibonacci number and it's also prime.
>>> prime_fib(1)
2
>>> prime_fib(2)
3
>>> prime_fib(3)
5
>>> prime_fib(4)
13
>>> prime_fib(5)
89
"""
def is_prime(num):
if num < 2:
return False
for i in range(2, int(num**0.5) + 1):
if num % i == 0:
return False
return True
count = 0
a, b = 0, 1
while True:
if is_prime(a):
count += 1
if count == n:
return a
a, b = b, a + bReasoning trace
sandbox: passed=false returncode=-24 timed_out=false
Extracted:
fail · Gold: passsample 29 · humaneval-plus:HumanEval/103fail · 0.0% · 18262ms · 9fac21aafd37
Question
def rounded_avg(n, m):
"""You are given two positive integers n and m, and your task is to compute the
average of the integers from n through m (including n and m).
Round the answer to the nearest integer and convert that to binary.
If n is greater than m, return -1.
Example:
rounded_avg(1, 5) => "0b11"
rounded_avg(7, 5) => -1
rounded_avg(10, 20) => "0b1111"
rounded_avg(20, 33) => "0b11010"
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def rounded_avg(n, m):
"""You are given two positive integers n and m, and your task is to compute the
average of the integers from n through m (including n and m).
Round the answer to the nearest integer and convert that to binary.
If n is greater than m, return -1.
Example:
rounded_avg(1, 5) => "0b11"
rounded_avg(7, 5) => -1
rounded_avg(10, 20) => "0b1111"
rounded_avg(20, 33) => "0b11010"
"""
Model answer
def rounded_avg(n, m):
"""You are given two positive integers n and m, and your task is to compute the
average of the integers from n through m (including n and m).
Round the answer to the nearest integer and convert that to binary.
If n is greater than m, return -1.
Example:
rounded_avg(1, 5) => "0b11"
rounded_avg(7, 5) => -1
rounded_avg(10, 20) => "0b1111"
rounded_avg(20, 33) => "0b11010"
"""
if n > m:
return -1
# The sum of integers from n to m is (n + m) * (m - n + 1) / 2
# The average is the sum divided by the number of elements (m - n + 1)
# Average = (n + m) / 2
avg = (n + m) / 2
# Round to the nearest integer.
# Python's round() rounds to the nearest even number for .5 cases.
# Standard mathematical rounding (round half up) is usually expected in these challenges.
# However, for (n+m)/2, the result is always .0 or .5.
# int(avg + 0.5) provides the standard "round half up" behavior.
rounded_val = int(avg + 0.5)
return bin(rounded_val)Reasoning trace
sandbox: passed=false returncode=1 timed_out=false
Traceback (most recent call last):
File "/tmp/tmpfkgywgno/solution.py", line 62, in <module>
check(rounded_avg)
File "/tmp/tmpfkgywgno/solution.py", line 59, in check
assertion(candidate(*inp), exp, 0)
File "/tmp/tmpfkgywgno/solution.py", line 52, in assertion
assert exact_match
^^^^^^^^^^^
AssertionErrorExtracted:
fail · Gold: passsample 40 · humaneval-plus:HumanEval/99fail · 0.0% · 20644ms · 125aa68cf8ce
Question
def closest_integer(value):
'''
Create a function that takes a value (string) representing a number
and returns the closest integer to it. If the number is equidistant
from two integers, round it away from zero.
Examples
>>> closest_integer("10")
10
>>> closest_integer("15.3")
15
Note:
Rounding away from zero means that if the given number is equidistant
from two integers, the one you should return is the one that is the
farthest from zero. For example closest_integer("14.5") should
return 15 and closest_integer("-14.5") should return -15.
'''Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def closest_integer(value):
'''
Create a function that takes a value (string) representing a number
and returns the closest integer to it. If the number is equidistant
from two integers, round it away from zero.
Examples
>>> closest_integer("10")
10
>>> closest_integer("15.3")
15
Note:
Rounding away from zero means that if the given number is equidistant
from two integers, the one you should return is the one that is the
farthest from zero. For example closest_integer("14.5") should
return 15 and closest_integer("-14.5") should return -15.
'''
Model answer
def closest_integer(value):
'''
Create a function that takes a value (string) representing a number
and returns the closest integer to it. If the number is equidistant
from two integers, round it away from zero.
Examples
>>> closest_integer("10")
10
>>> closest_integer("15.3")
15
Note:
Rounding away from zero means that if the given number is equidistant
from two integers, the one you should return is the one that is the
farthest from zero. For example closest_integer("14.5") should
return 15 and closest_integer("-14.5") should return -15.
'''
import decimal
# Use Decimal for precise representation of the string value
num = decimal.Decimal(value)
# ROUND_HALF_UP rounds towards the nearest neighbor.
# If equidistant, it rounds away from zero for positive numbers.
# However, for negative numbers, ROUND_HALF_UP rounds towards zero (e.g., -1.5 -> -1).
# To achieve "away from zero" for both, we use ROUND_HALF_UP on the absolute value.
rounded = num.quantize(decimal.Decimal('1'), rounding=decimal.ROUND_HALF_UP)
# Decimal's ROUND_HALF_UP actually rounds 0.5 to 1 and -0.5 to 0.
# To strictly follow "away from zero" for negatives (-14.5 -> -15):
if num < 0:
# For negative numbers, we can use ROUND_HALF_DOWN to push -0.5 to -1
rounded = num.quantize(decimal.Decimal('1'), rounding=decimal.ROUND_HALF_DOWN)
return int(rounded)Reasoning trace
sandbox: passed=false returncode=1 timed_out=false
Traceback (most recent call last):
File "/tmp/tmpfz0khol_/solution.py", line 72, in <module>
check(closest_integer)
File "/tmp/tmpfz0khol_/solution.py", line 69, in check
assertion(candidate(*inp), exp, 0)
File "/tmp/tmpfz0khol_/solution.py", line 62, in assertion
assert exact_match
^^^^^^^^^^^
AssertionErrorExtracted:
fail · Gold: passby Stewart_of_Mars · shard 7 · 6/29/2026, 3:40:32 AM · cmqyo6vcf08q0pk01sdic5xke92.2%107/116 correct · 5 correct traces · 5 incorrect traces
by Stewart_of_Mars · shard 7 · 6/29/2026, 3:40:32 AM · cmqyo6vcf08q0pk01sdic5xke
92.2%
Correct samples
sample 1 · humaneval-plus:HumanEval/55pass · 100.0% · 5520ms · a266c8b77394
Question
def fib(n: int):
"""Return n-th Fibonacci number.
>>> fib(10)
55
>>> fib(1)
1
>>> fib(8)
21
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def fib(n: int):
"""Return n-th Fibonacci number.
>>> fib(10)
55
>>> fib(1)
1
>>> fib(8)
21
"""
Model answer
def fib(n: int):
"""Return n-th Fibonacci number.
>>> fib(10)
55
>>> fib(1)
1
>>> fib(8)
21
"""
if n <= 0:
return 0
a, b = 0, 1
for _ in range(n):
a, b = b, a + b
return aReasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passsample 2 · humaneval-plus:HumanEval/130pass · 100.0% · 37243ms · f089dd9db4e3
Question
def tri(n):
"""Everyone knows Fibonacci sequence, it was studied deeply by mathematicians in
the last couple centuries. However, what people don't know is Tribonacci sequence.
Tribonacci sequence is defined by the recurrence:
tri(1) = 3
tri(n) = 1 + n / 2, if n is even.
tri(n) = tri(n - 1) + tri(n - 2) + tri(n + 1), if n is odd.
For example:
tri(2) = 1 + (2 / 2) = 2
tri(4) = 3
tri(3) = tri(2) + tri(1) + tri(4)
= 2 + 3 + 3 = 8
You are given a non-negative integer number n, you have to a return a list of the
first n + 1 numbers of the Tribonacci sequence.
Examples:
tri(3) = [1, 3, 2, 8]
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def tri(n):
"""Everyone knows Fibonacci sequence, it was studied deeply by mathematicians in
the last couple centuries. However, what people don't know is Tribonacci sequence.
Tribonacci sequence is defined by the recurrence:
tri(1) = 3
tri(n) = 1 + n / 2, if n is even.
tri(n) = tri(n - 1) + tri(n - 2) + tri(n + 1), if n is odd.
For example:
tri(2) = 1 + (2 / 2) = 2
tri(4) = 3
tri(3) = tri(2) + tri(1) + tri(4)
= 2 + 3 + 3 = 8
You are given a non-negative integer number n, you have to a return a list of the
first n + 1 numbers of the Tribonacci sequence.
Examples:
tri(3) = [1, 3, 2, 8]
"""
Model answer
def tri(n):
"""Everyone knows Fibonacci sequence, it was studied deeply by mathematicians in
the last couple centuries. However, what people don't know is Tribonacci sequence.
Tribonacci sequence is defined by the recurrence:
tri(1) = 3
tri(n) = 1 + n / 2, if n is even.
tri(n) = tri(n - 1) + tri(n - 2) + tri(n + 1), if n is odd.
For example:
tri(2) = 1 + (2 / 2) = 2
tri(4) = 3
tri(3) = tri(2) + tri(1) + tri(4)
= 2 + 3 + 3 = 8
You are given a non-negative integer number n, you have to a return a list of the
first n + 1 numbers of the Tribonacci sequence.
Examples:
tri(3) = [1, 3, 2, 8]
"""
# The problem description implies a sequence where values depend on
# indices. Let's define a helper to get the value at index i.
# Based on the examples:
# tri(0) = 1 (implied by tri(3) = [1, 3, 2, 8])
# tri(1) = 3
# tri(2) = 1 + 2/2 = 2
# tri(3) = tri(2) + tri(1) + tri(4)
# tri(4) = 1 + 4/2 = 3
# Since tri(n) for odd n depends on tri(n+1), we must calculate
# even indices first or handle the dependency.
res = [0] * (n + 1)
# Base case for index 0 as per example tri(3) = [1, 3, 2, 8]
if n >= 0:
res[0] = 1
# We need to calculate values up to n.
# However, if n is odd, we need tri(n+1) to find tri(n).
# Let's calculate values for indices up to n + 1 if n is odd.
limit = n + 1 if n % 2 != 0 else n
# Temporary list to hold values up to limit
vals = [0] * (limit + 1)
if limit >= 0: vals[0] = 1
for i in range(1, limit + 1):
if i == 1:
vals[i] = 3
elif i % 2 == 0:
vals[i] = 1 + i // 2
else:
# i is odd, depends on i-1, i-2, and i+1
# Since i+1 is even, we can calculate it immediately
val_plus_1 = 1 + (i + 1) // 2
vals[i] = vals[i-1] + vals[i-2] + val_plus_1
return vals[:n+1]Reasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passsample 3 · humaneval-plus:HumanEval/43pass · 100.0% · 11271ms · a143c2ff3506
Question
def pairs_sum_to_zero(l):
"""
pairs_sum_to_zero takes a list of integers as an input.
it returns True if there are two distinct elements in the list that
sum to zero, and False otherwise.
>>> pairs_sum_to_zero([1, 3, 5, 0])
False
>>> pairs_sum_to_zero([1, 3, -2, 1])
False
>>> pairs_sum_to_zero([1, 2, 3, 7])
False
>>> pairs_sum_to_zero([2, 4, -5, 3, 5, 7])
True
>>> pairs_sum_to_zero([1])
False
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def pairs_sum_to_zero(l):
"""
pairs_sum_to_zero takes a list of integers as an input.
it returns True if there are two distinct elements in the list that
sum to zero, and False otherwise.
>>> pairs_sum_to_zero([1, 3, 5, 0])
False
>>> pairs_sum_to_zero([1, 3, -2, 1])
False
>>> pairs_sum_to_zero([1, 2, 3, 7])
False
>>> pairs_sum_to_zero([2, 4, -5, 3, 5, 7])
True
>>> pairs_sum_to_zero([1])
False
"""
Model answer
def pairs_sum_to_zero(l):
"""
pairs_sum_to_zero takes a list of integers as an input.
it returns True if there are two distinct elements in the list that
sum to zero, and False otherwise.
>>> pairs_sum_to_zero([1, 3, 5, 0])
False
>>> pairs_sum_to_zero([1, 3, -2, 1])
False
>>> pairs_sum_to_zero([1, 2, 3, 7])
False
>>> pairs_sum_to_zero([2, 4, -5, 3, 5, 7])
True
>>> pairs_sum_to_zero([1])
False
"""
seen = set()
for num in l:
if -num in seen:
return True
seen.add(num)
return FalseReasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passsample 4 · humaneval-plus:HumanEval/59pass · 100.0% · 6210ms · d72103fae8b1
Question
def largest_prime_factor(n: int):
"""Return the largest prime factor of n. Assume n > 1 and is not a prime.
>>> largest_prime_factor(13195)
29
>>> largest_prime_factor(2048)
2
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def largest_prime_factor(n: int):
"""Return the largest prime factor of n. Assume n > 1 and is not a prime.
>>> largest_prime_factor(13195)
29
>>> largest_prime_factor(2048)
2
"""
Model answer
def largest_prime_factor(n: int):
"""Return the largest prime factor of n. Assume n > 1 and is not a prime.
>>> largest_prime_factor(13195)
29
>>> largest_prime_factor(2048)
2
"""
d = 2
while d * d <= n:
if n % d == 0:
n //= d
else:
d += 1
return nReasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passsample 5 · humaneval-plus:HumanEval/19pass · 100.0% · 10873ms · 41017e761442
Question
from typing import List
def sort_numbers(numbers: str) -> str:
""" Input is a space-delimited string of numberals from 'zero' to 'nine'.
Valid choices are 'zero', 'one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight' and 'nine'.
Return the string with numbers sorted from smallest to largest
>>> sort_numbers('three one five')
'one three five'
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
from typing import List
def sort_numbers(numbers: str) -> str:
""" Input is a space-delimited string of numberals from 'zero' to 'nine'.
Valid choices are 'zero', 'one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight' and 'nine'.
Return the string with numbers sorted from smallest to largest
>>> sort_numbers('three one five')
'one three five'
"""
Model answer
from typing import List
def sort_numbers(numbers: str) -> str:
""" Input is a space-delimited string of numberals from 'zero' to 'nine'.
Valid choices are 'zero', 'one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight' and 'nine'.
Return the string with numbers sorted from smallest to largest
>>> sort_numbers('three one five')
'one three five'
"""
mapping = {
'zero': 0, 'one': 1, 'two': 2, 'three': 3, 'four': 4,
'five': 5, 'six': 6, 'seven': 7, 'eight': 8, 'nine': 9
}
words = numbers.split()
words.sort(key=lambda x: mapping[x])
return ' '.join(words)Reasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passIncorrect samples
sample 25 · humaneval-plus:HumanEval/99fail · 0.0% · 12573ms · 125aa68cf8ce
Question
def closest_integer(value):
'''
Create a function that takes a value (string) representing a number
and returns the closest integer to it. If the number is equidistant
from two integers, round it away from zero.
Examples
>>> closest_integer("10")
10
>>> closest_integer("15.3")
15
Note:
Rounding away from zero means that if the given number is equidistant
from two integers, the one you should return is the one that is the
farthest from zero. For example closest_integer("14.5") should
return 15 and closest_integer("-14.5") should return -15.
'''Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def closest_integer(value):
'''
Create a function that takes a value (string) representing a number
and returns the closest integer to it. If the number is equidistant
from two integers, round it away from zero.
Examples
>>> closest_integer("10")
10
>>> closest_integer("15.3")
15
Note:
Rounding away from zero means that if the given number is equidistant
from two integers, the one you should return is the one that is the
farthest from zero. For example closest_integer("14.5") should
return 15 and closest_integer("-14.5") should return -15.
'''
Model answer
def closest_integer(value):
'''
Create a function that takes a value (string) representing a number
and returns the closest integer to it. If the number is equidistant
from two integers, round it away from zero.
Examples
>>> closest_integer("10")
10
>>> closest_integer("15.3")
15
Note:
Rounding away from zero means that if the given number is equidistant
from two integers, the one you should return is the one that is the
farthest from zero. For example closest_integer("14.5") should
return 15 and closest_integer("-14.5") should return -15.
'''
import decimal
# Use decimal module to avoid floating point precision issues
# ROUND_HALF_UP rounds towards the nearest neighbor, with ties going away from zero
num = decimal.Decimal(value)
rounded = num.quantize(decimal.Decimal('1'), rounding=decimal.ROUND_HALF_UP)
return int(rounded)Reasoning trace
sandbox: passed=false returncode=1 timed_out=false
Traceback (most recent call last):
File "/tmp/tmp8am6s8t_/solution.py", line 60, in <module>
check(closest_integer)
File "/tmp/tmp8am6s8t_/solution.py", line 57, in check
assertion(candidate(*inp), exp, 0)
File "/tmp/tmp8am6s8t_/solution.py", line 50, in assertion
assert exact_match
^^^^^^^^^^^
AssertionErrorExtracted:
fail · Gold: passsample 50 · humaneval-plus:HumanEval/76fail · 0.0% · 11545ms · baea024f4bf6
Question
def is_simple_power(x, n):
"""Your task is to write a function that returns true if a number x is a simple
power of n and false in other cases.
x is a simple power of n if n**int=x
For example:
is_simple_power(1, 4) => true
is_simple_power(2, 2) => true
is_simple_power(8, 2) => true
is_simple_power(3, 2) => false
is_simple_power(3, 1) => false
is_simple_power(5, 3) => false
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def is_simple_power(x, n):
"""Your task is to write a function that returns true if a number x is a simple
power of n and false in other cases.
x is a simple power of n if n**int=x
For example:
is_simple_power(1, 4) => true
is_simple_power(2, 2) => true
is_simple_power(8, 2) => true
is_simple_power(3, 2) => false
is_simple_power(3, 1) => false
is_simple_power(5, 3) => false
"""
Model answer
def is_simple_power(x, n):
"""Your task is to write a function that returns true if a number x is a simple
power of n and false in other cases.
x is a simple power of n if n**int=x
For example:
is_simple_power(1, 4) => true
is_simple_power(2, 2) => true
is_simple_power(8, 2) => true
is_simple_power(3, 2) => false
is_simple_power(3, 1) => false
is_simple_power(5, 3) => false
"""
if x == 1:
return True
if n == 1:
return x == 1
if n == 0:
return x == 0
power = 1
while power < x:
power *= n
return power == xReasoning trace
sandbox: passed=false returncode=1 timed_out=false
Traceback (most recent call last):
File "/tmp/tmpjh4wj2ap/solution.py", line 59, in <module>
check(is_simple_power)
File "/tmp/tmpjh4wj2ap/solution.py", line 56, in check
assertion(candidate(*inp), exp, 0)
File "/tmp/tmpjh4wj2ap/solution.py", line 49, in assertion
assert exact_match
^^^^^^^^^^^
AssertionErrorExtracted:
fail · Gold: passsample 51 · humaneval-plus:HumanEval/97fail · 0.0% · 7188ms · b7573e2145a3
Question
def multiply(a, b):
"""Complete the function that takes two integers and returns
the product of their unit digits.
Assume the input is always valid.
Examples:
multiply(148, 412) should return 16.
multiply(19, 28) should return 72.
multiply(2020, 1851) should return 0.
multiply(14,-15) should return 20.
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def multiply(a, b):
"""Complete the function that takes two integers and returns
the product of their unit digits.
Assume the input is always valid.
Examples:
multiply(148, 412) should return 16.
multiply(19, 28) should return 72.
multiply(2020, 1851) should return 0.
multiply(14,-15) should return 20.
"""
Model answer
def multiply(a, b):
"""Complete the function that takes two integers and returns
the product of their unit digits.
Assume the input is always valid.
Examples:
multiply(148, 412) should return 16.
multiply(19, 28) should return 72.
multiply(2020, 1851) should return 0.
multiply(14,-15) should return 20.
"""
return abs(a % 10) * abs(b % 10)Reasoning trace
sandbox: passed=false returncode=1 timed_out=false
Traceback (most recent call last):
File "/tmp/tmpq5j_itew/solution.py", line 46, in <module>
check(multiply)
File "/tmp/tmpq5j_itew/solution.py", line 43, in check
assertion(candidate(*inp), exp, 0)
File "/tmp/tmpq5j_itew/solution.py", line 36, in assertion
assert exact_match
^^^^^^^^^^^
AssertionErrorExtracted:
fail · Gold: passsample 61 · humaneval-plus:HumanEval/129fail · 0.0% · 21905ms · b1ae63818edc
Question
def minPath(grid, k):
"""
Given a grid with N rows and N columns (N >= 2) and a positive integer k,
each cell of the grid contains a value. Every integer in the range [1, N * N]
inclusive appears exactly once on the cells of the grid.
You have to find the minimum path of length k in the grid. You can start
from any cell, and in each step you can move to any of the neighbor cells,
in other words, you can go to cells which share an edge with you current
cell.
Please note that a path of length k means visiting exactly k cells (not
necessarily distinct).
You CANNOT go off the grid.
A path A (of length k) is considered less than a path B (of length k) if
after making the ordered lists of the values on the cells that A and B go
through (let's call them lst_A and lst_B), lst_A is lexicographically less
than lst_B, in other words, there exist an integer index i (1 <= i <= k)
such that lst_A[i] < lst_B[i] and for any j (1 <= j < i) we have
lst_A[j] = lst_B[j].
It is guaranteed that the answer is unique.
Return an ordered list of the values on the cells that the minimum path go through.
Examples:
Input: grid = [ [1,2,3], [4,5,6], [7,8,9]], k = 3
Output: [1, 2, 1]
Input: grid = [ [5,9,3], [4,1,6], [7,8,2]], k = 1
Output: [1]
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def minPath(grid, k):
"""
Given a grid with N rows and N columns (N >= 2) and a positive integer k,
each cell of the grid contains a value. Every integer in the range [1, N * N]
inclusive appears exactly once on the cells of the grid.
You have to find the minimum path of length k in the grid. You can start
from any cell, and in each step you can move to any of the neighbor cells,
in other words, you can go to cells which share an edge with you current
cell.
Please note that a path of length k means visiting exactly k cells (not
necessarily distinct).
You CANNOT go off the grid.
A path A (of length k) is considered less than a path B (of length k) if
after making the ordered lists of the values on the cells that A and B go
through (let's call them lst_A and lst_B), lst_A is lexicographically less
than lst_B, in other words, there exist an integer index i (1 <= i <= k)
such that lst_A[i] < lst_B[i] and for any j (1 <= j < i) we have
lst_A[j] = lst_B[j].
It is guaranteed that the answer is unique.
Return an ordered list of the values on the cells that the minimum path go through.
Examples:
Input: grid = [ [1,2,3], [4,5,6], [7,8,9]], k = 3
Output: [1, 2, 1]
Input: grid = [ [5,9,3], [4,1,6], [7,8,2]], k = 1
Output: [1]
"""
Model answer
def minPath(grid, k):
"""
Given a grid with N rows and N columns and a positive integer k,
find the lexicographically smallest path of length k.
"""
n = len(grid)
# Find the minimum value in the grid and its position
min_val = float('inf')
start_pos = (0, 0)
for r in range(n):
for c in range(n):
if grid[r][c] < min_val:
min_val = grid[r][c]
start_pos = (r, c)
# If k == 1, the path is just the minimum element
if k == 1:
return [min_val]
# To get the lexicographically smallest path, we start at the minimum element.
# Then we look for the smallest neighbor of that minimum element.
# Once we find the smallest neighbor, we can oscillate between the minimum
# element and its smallest neighbor for the remainder of the path.
# This is because any other path would either start with a larger element
# or have a larger second element.
r, c = start_pos
min_neighbor = float('inf')
for dr, dc in [(0, 1), (0, -1), (1, 0), (-1, 0)]:
nr, nc = r + dr, c + dc
if 0 <= nr < n and 0 <= nc < n:
if grid[nr][nc] < min_neighbor:
min_neighbor = grid[nr][nc]
# The path will be [min_val, min_neighbor, min_val, min_neighbor, ...]
path = []
for i in range(k):
if i % 2 == 0:
path.append(min_val)
else:
path.append(min_neighbor)
return pathReasoning trace
sandbox: passed=false returncode=1 timed_out=false
File "/tmp/tmp2p8_49vr/solution.py", line 1
from any cell, and in each step you can move to any of the neighbor cells,
^^^^
SyntaxError: invalid syntaxExtracted:
fail · Gold: passsample 62 · humaneval-plus:HumanEval/101fail · 0.0% · 7571ms · 494a4417729c
Question
def words_string(s):
"""
You will be given a string of words separated by commas or spaces. Your task is
to split the string into words and return an array of the words.
For example:
words_string("Hi, my name is John") == ["Hi", "my", "name", "is", "John"]
words_string("One, two, three, four, five, six") == ["One", "two", "three", "four", "five", "six"]
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def words_string(s):
"""
You will be given a string of words separated by commas or spaces. Your task is
to split the string into words and return an array of the words.
For example:
words_string("Hi, my name is John") == ["Hi", "my", "name", "is", "John"]
words_string("One, two, three, four, five, six") == ["One", "two", "three", "four", "five", "six"]
"""
Model answer
import re
def words_string(s):
"""
You will be given a string of words separated by commas or spaces. Your task is
to split the string into words and return an array of the words.
For example:
words_string("Hi, my name is John") == ["Hi", "my", "name", "is", "John"]
words_string("One, two, three, four, five, six") == ["One", "two", "three", "four", "five", "six"]
"""
return re.split(r'[,\s]+', s.strip(', ')) if s else []Reasoning trace
sandbox: passed=false returncode=1 timed_out=false
Traceback (most recent call last):
File "/tmp/tmps4d2fjrm/solution.py", line 47, in <module>
check(words_string)
File "/tmp/tmps4d2fjrm/solution.py", line 44, in check
assertion(candidate(*inp), exp, 0)
File "/tmp/tmps4d2fjrm/solution.py", line 35, in assertion
assert np.allclose(out, exp, rtol=1e-07, atol=atol)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/numpy/_core/numeric.py", line 2341, in allclose
res = all(isclose(a, b, rtol=rtol, atol=atol, equal_nan=equal_nan))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/numpy/_core/numeric.py", line 2461, in isclose
result = (less_equal(abs(x-y), atol + rtol * abs(y))
~^~
numpy._core._exceptions._UFuncNoLoopError: ufunc 'subtract' did not contain a loop with signature matching types (dtype('<U1'), dtype('float64')) -> NoneExtracted:
fail · Gold: passby Stewart_of_Mars · shard 6 · 6/29/2026, 3:38:20 AM · cmqyo41tc08mepk01qwl5wt9e92.2%107/116 correct · 5 correct traces · 5 incorrect traces
by Stewart_of_Mars · shard 6 · 6/29/2026, 3:38:20 AM · cmqyo41tc08mepk01qwl5wt9e
92.2%
Correct samples
sample 1 · humaneval-plus:HumanEval/79pass · 100.0% · 8499ms · 62a14af1dec7
Question
def decimal_to_binary(decimal):
"""You will be given a number in decimal form and your task is to convert it to
binary format. The function should return a string, with each character representing a binary
number. Each character in the string will be '0' or '1'.
There will be an extra couple of characters 'db' at the beginning and at the end of the string.
The extra characters are there to help with the format.
Examples:
decimal_to_binary(15) # returns "db1111db"
decimal_to_binary(32) # returns "db100000db"
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def decimal_to_binary(decimal):
"""You will be given a number in decimal form and your task is to convert it to
binary format. The function should return a string, with each character representing a binary
number. Each character in the string will be '0' or '1'.
There will be an extra couple of characters 'db' at the beginning and at the end of the string.
The extra characters are there to help with the format.
Examples:
decimal_to_binary(15) # returns "db1111db"
decimal_to_binary(32) # returns "db100000db"
"""
Model answer
def decimal_to_binary(decimal):
"""You will be given a number in decimal form and your task is to convert it to
binary format. The function should return a string, with each character representing a binary
number. Each character in the string will be '0' or '1'.
There will be an extra couple of characters 'db' at the beginning and at the end of the string.
The extra characters are there to help with the format.
Examples:
decimal_to_binary(15) # returns "db1111db"
decimal_to_binary(32) # returns "db100000db"
"""
binary_string = bin(decimal)[2:]
return f"db{binary_string}db"Reasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passsample 3 · humaneval-plus:HumanEval/162pass · 100.0% · 6023ms · 3ff16d224363
Question
def string_to_md5(text):
"""
Given a string 'text', return its md5 hash equivalent string.
If 'text' is an empty string, return None.
>>> string_to_md5('Hello world') == '3e25960a79dbc69b674cd4ec67a72c62'
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def string_to_md5(text):
"""
Given a string 'text', return its md5 hash equivalent string.
If 'text' is an empty string, return None.
>>> string_to_md5('Hello world') == '3e25960a79dbc69b674cd4ec67a72c62'
"""
Model answer
import hashlib
def string_to_md5(text):
"""
Given a string 'text', return its md5 hash equivalent string.
If 'text' is an empty string, return None.
>>> string_to_md5('Hello world') == '3e25960a79dbc69b674cd4ec67a72c62'
"""
if not text:
return None
return hashlib.md5(text.encode('utf-8')).hexdigest()Reasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passsample 4 · humaneval-plus:HumanEval/8pass · 100.0% · 7923ms · 90793e9e2d04
Question
from typing import List, Tuple
def sum_product(numbers: List[int]) -> Tuple[int, int]:
""" For a given list of integers, return a tuple consisting of a sum and a product of all the integers in a list.
Empty sum should be equal to 0 and empty product should be equal to 1.
>>> sum_product([])
(0, 1)
>>> sum_product([1, 2, 3, 4])
(10, 24)
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
from typing import List, Tuple
def sum_product(numbers: List[int]) -> Tuple[int, int]:
""" For a given list of integers, return a tuple consisting of a sum and a product of all the integers in a list.
Empty sum should be equal to 0 and empty product should be equal to 1.
>>> sum_product([])
(0, 1)
>>> sum_product([1, 2, 3, 4])
(10, 24)
"""
Model answer
from typing import List, Tuple
def sum_product(numbers: List[int]) -> Tuple[int, int]:
""" For a given list of integers, return a tuple consisting of a sum and a product of all the integers in a list.
Empty sum should be equal to 0 and empty product should be equal to 1.
>>> sum_product([])
(0, 1)
>>> sum_product([1, 2, 3, 4])
(10, 24)
"""
total_sum = 0
total_product = 1
for num in numbers:
total_sum += num
total_product *= num
return total_sum, total_productReasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passsample 5 · humaneval-plus:HumanEval/104pass · 100.0% · 8916ms · a0ede5f5b3ae
Question
def unique_digits(x):
"""Given a list of positive integers x. return a sorted list of all
elements that hasn't any even digit.
Note: Returned list should be sorted in increasing order.
For example:
>>> unique_digits([15, 33, 1422, 1])
[1, 15, 33]
>>> unique_digits([152, 323, 1422, 10])
[]
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def unique_digits(x):
"""Given a list of positive integers x. return a sorted list of all
elements that hasn't any even digit.
Note: Returned list should be sorted in increasing order.
For example:
>>> unique_digits([15, 33, 1422, 1])
[1, 15, 33]
>>> unique_digits([152, 323, 1422, 10])
[]
"""
Model answer
def unique_digits(x):
"""Given a list of positive integers x. return a sorted list of all
elements that hasn't any even digit.
Note: Returned list should be sorted in increasing order.
For example:
>>> unique_digits([15, 33, 1422, 1])
[1, 15, 33]
>>> unique_digits([152, 323, 1422, 10])
[]
"""
result = []
for num in x:
# Check if all digits in the number are odd
if all(int(digit) % 2 != 0 for digit in str(num)):
result.append(num)
return sorted(result)Reasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passsample 6 · humaneval-plus:HumanEval/122pass · 100.0% · 9969ms · b993751fffe7
Question
def add_elements(arr, k):
"""
Given a non-empty array of integers arr and an integer k, return
the sum of the elements with at most two digits from the first k elements of arr.
Example:
Input: arr = [111,21,3,4000,5,6,7,8,9], k = 4
Output: 24 # sum of 21 + 3
Constraints:
1. 1 <= len(arr) <= 100
2. 1 <= k <= len(arr)
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def add_elements(arr, k):
"""
Given a non-empty array of integers arr and an integer k, return
the sum of the elements with at most two digits from the first k elements of arr.
Example:
Input: arr = [111,21,3,4000,5,6,7,8,9], k = 4
Output: 24 # sum of 21 + 3
Constraints:
1. 1 <= len(arr) <= 100
2. 1 <= k <= len(arr)
"""
Model answer
def add_elements(arr, k):
"""
Given a non-empty array of integers arr and an integer k, return
the sum of the elements with at most two digits from the first k elements of arr.
Example:
Input: arr = [111,21,3,4000,5,6,7,8,9], k = 4
Output: 24 # sum of 21 + 3
Constraints:
1. 1 <= len(arr) <= 100
2. 1 <= k <= len(arr)
"""
total_sum = 0
for i in range(k):
# An integer has at most two digits if its absolute value is less than 100
if abs(arr[i]) < 100:
total_sum += arr[i]
return total_sumReasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passIncorrect samples
sample 2 · humaneval-plus:HumanEval/132fail · 0.0% · 28871ms · d65ed04e2d42
Question
def is_nested(string):
'''
Create a function that takes a string as input which contains only square brackets.
The function should return True if and only if there is a valid subsequence of brackets
where at least one bracket in the subsequence is nested.
is_nested('[[]]') ➞ True
is_nested('[]]]]]]][[[[[]') ➞ False
is_nested('[][]') ➞ False
is_nested('[]') ➞ False
is_nested('[[][]]') ➞ True
is_nested('[[]][[') ➞ True
'''Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def is_nested(string):
'''
Create a function that takes a string as input which contains only square brackets.
The function should return True if and only if there is a valid subsequence of brackets
where at least one bracket in the subsequence is nested.
is_nested('[[]]') ➞ True
is_nested('[]]]]]]][[[[[]') ➞ False
is_nested('[][]') ➞ False
is_nested('[]') ➞ False
is_nested('[[][]]') ➞ True
is_nested('[[]][[') ➞ True
'''
Model answer
def is_nested(string):
'''
Create a function that takes a string as input which contains only square brackets.
The function should return True if and only if there is a valid subsequence of brackets
where at least one bracket in the subsequence is nested.
'''
count = 0
for char in string:
if char == '[':
count += 1
elif char == ']':
count -= 1
# If count reaches 2, it means we have an open bracket
# inside another open bracket, which allows for nesting.
# However, we must ensure that this nesting is eventually closed.
# The problem asks for a "valid subsequence", which implies
# we just need to find if '[[...]]' exists within the string.
if count >= 2:
# Check if there are at least two closing brackets remaining
# in the string to potentially close these nested brackets.
if string.count(']') >= 2:
# To be strictly a valid nested subsequence, we need to
# ensure that at some point the depth was 2 and then returned to 0.
# A simpler way: if we find '[[', and there are enough ']'
# later to close them, it's nested.
# But the most robust way is to track the maximum depth reached
# during a valid pairing process.
pass
# Correct logic: A string contains a nested valid subsequence if
# it contains the pattern '[[' and there are enough ']' to close them.
# More accurately, if we can find a pair of brackets inside another pair.
# This is equivalent to checking if the string contains '[[' and ']]'
# such that the first '[' is closed by the last ']' and there is a pair inside.
# Simplified: If we can find '[[', and there are at least two ']'
# appearing after the first '[', we can potentially form a nested structure.
# Actually, the simplest check for "is there a nested valid subsequence"
# is checking if the string contains the substring '[[' and ']]'
# is not quite right. The correct way is to see if the maximum depth
# of a validly matched sequence is >= 2.
stack = []
max_depth = 0
current_depth = 0
# We only care about validly matched pairs.
# We can use a stack to find all matching pairs and track their nesting level.
for i, char in enumerate(string):
if char == '[':
stack.append(i)
elif char == ']' and stack:
start_index = stack.pop()
# The depth of this pair is the number of open brackets
# currently in the stack + 1
depth = len(stack) + 1
if depth >= 2:
return True
return FalseReasoning trace
sandbox: passed=false returncode=1 timed_out=false
Traceback (most recent call last):
File "/tmp/tmp1ds_2749/solution.py", line 96, in <module>
check(is_nested)
File "/tmp/tmp1ds_2749/solution.py", line 93, in check
assertion(candidate(*inp), exp, 0)
File "/tmp/tmp1ds_2749/solution.py", line 86, in assertion
assert exact_match
^^^^^^^^^^^
AssertionErrorExtracted:
fail · Gold: passsample 8 · humaneval-plus:HumanEval/97fail · 0.0% · 6627ms · b7573e2145a3
Question
def multiply(a, b):
"""Complete the function that takes two integers and returns
the product of their unit digits.
Assume the input is always valid.
Examples:
multiply(148, 412) should return 16.
multiply(19, 28) should return 72.
multiply(2020, 1851) should return 0.
multiply(14,-15) should return 20.
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def multiply(a, b):
"""Complete the function that takes two integers and returns
the product of their unit digits.
Assume the input is always valid.
Examples:
multiply(148, 412) should return 16.
multiply(19, 28) should return 72.
multiply(2020, 1851) should return 0.
multiply(14,-15) should return 20.
"""
Model answer
def multiply(a, b):
"""Complete the function that takes two integers and returns
the product of their unit digits.
Assume the input is always valid.
Examples:
multiply(148, 412) should return 16.
multiply(19, 28) should return 72.
multiply(2020, 1851) should return 0.
multiply(14,-15) should return 20.
"""
return abs(a % 10) * abs(b % 10)Reasoning trace
sandbox: passed=false returncode=1 timed_out=false
Traceback (most recent call last):
File "/tmp/tmp0xodk12c/solution.py", line 46, in <module>
check(multiply)
File "/tmp/tmp0xodk12c/solution.py", line 43, in check
assertion(candidate(*inp), exp, 0)
File "/tmp/tmp0xodk12c/solution.py", line 36, in assertion
assert exact_match
^^^^^^^^^^^
AssertionErrorExtracted:
fail · Gold: passsample 18 · humaneval-plus:HumanEval/39fail · 0.0% · 30408ms · c5befd80bb93
Question
def prime_fib(n: int):
"""
prime_fib returns n-th number that is a Fibonacci number and it's also prime.
>>> prime_fib(1)
2
>>> prime_fib(2)
3
>>> prime_fib(3)
5
>>> prime_fib(4)
13
>>> prime_fib(5)
89
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def prime_fib(n: int):
"""
prime_fib returns n-th number that is a Fibonacci number and it's also prime.
>>> prime_fib(1)
2
>>> prime_fib(2)
3
>>> prime_fib(3)
5
>>> prime_fib(4)
13
>>> prime_fib(5)
89
"""
Model answer
def prime_fib(n: int):
"""
prime_fib returns n-th number that is a Fibonacci number and it's also prime.
>>> prime_fib(1)
2
>>> prime_fib(2)
3
>>> prime_fib(3)
5
>>> prime_fib(4)
13
>>> prime_fib(5)
89
"""
def is_prime(num):
if num < 2:
return False
for i in range(2, int(num**0.5) + 1):
if num % i == 0:
return False
return True
count = 0
a, b = 0, 1
while True:
a, b = b, a + b
if is_prime(a):
count += 1
if count == n:
return aReasoning trace
sandbox: passed=false returncode=-24 timed_out=false
Extracted:
fail · Gold: passsample 29 · humaneval-plus:HumanEval/145fail · 0.0% · 12067ms · d5a81ec8d1f0
Question
def order_by_points(nums):
"""
Write a function which sorts the given list of integers
in ascending order according to the sum of their digits.
Note: if there are several items with similar sum of their digits,
order them based on their index in original list.
For example:
>>> order_by_points([1, 11, -1, -11, -12]) == [-1, -11, 1, -12, 11]
>>> order_by_points([]) == []
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def order_by_points(nums):
"""
Write a function which sorts the given list of integers
in ascending order according to the sum of their digits.
Note: if there are several items with similar sum of their digits,
order them based on their index in original list.
For example:
>>> order_by_points([1, 11, -1, -11, -12]) == [-1, -11, 1, -12, 11]
>>> order_by_points([]) == []
"""
Model answer
def order_by_points(nums):
"""
Write a function which sorts the given list of integers
in ascending order according to the sum of their digits.
Note: if there are several items with similar sum of their digits,
order them based on their index in original list.
For example:
>>> order_by_points([1, 11, -1, -11, -12]) == [-1, -11, 1, -12, 11]
>>> order_by_points([]) == []
"""
def get_digit_sum(n):
# Use absolute value to handle negative numbers as per the example
# Example: -11 -> 1+1 = 2; -12 -> 1+2 = 3
return sum(int(digit) for digit in str(abs(n)))
# Python's sort is stable, so we only need to sort by the digit sum.
# The original index order is preserved for items with the same key.
return sorted(nums, key=get_digit_sum)Reasoning trace
sandbox: passed=false returncode=1 timed_out=false
Traceback (most recent call last):
File "/tmp/tmpv6l1a6jf/solution.py", line 54, in <module>
check(order_by_points)
File "/tmp/tmpv6l1a6jf/solution.py", line 51, in check
assertion(candidate(*inp), exp, 0)
File "/tmp/tmpv6l1a6jf/solution.py", line 44, in assertion
assert exact_match
^^^^^^^^^^^
AssertionErrorExtracted:
fail · Gold: passsample 55 · humaneval-plus:HumanEval/116fail · 0.0% · 14016ms · e424308979c4
Question
def sort_array(arr):
"""
In this Kata, you have to sort an array of non-negative integers according to
number of ones in their binary representation in ascending order.
For similar number of ones, sort based on decimal value.
It must be implemented like this:
>>> sort_array([1, 5, 2, 3, 4]) == [1, 2, 3, 4, 5]
>>> sort_array([-2, -3, -4, -5, -6]) == [-6, -5, -4, -3, -2]
>>> sort_array([1, 0, 2, 3, 4]) [0, 1, 2, 3, 4]
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def sort_array(arr):
"""
In this Kata, you have to sort an array of non-negative integers according to
number of ones in their binary representation in ascending order.
For similar number of ones, sort based on decimal value.
It must be implemented like this:
>>> sort_array([1, 5, 2, 3, 4]) == [1, 2, 3, 4, 5]
>>> sort_array([-2, -3, -4, -5, -6]) == [-6, -5, -4, -3, -2]
>>> sort_array([1, 0, 2, 3, 4]) [0, 1, 2, 3, 4]
"""
Model answer
def sort_array(arr):
"""
In this Kata, you have to sort an array of non-negative integers according to
number of ones in their binary representation in ascending order.
For similar number of ones, sort based on decimal value.
It must be implemented like this:
>>> sort_array([1, 5, 2, 3, 4]) == [1, 2, 3, 4, 5]
>>> sort_array([-2, -3, -4, -5, -6]) == [-6, -5, -4, -3, -2]
>>> sort_array([1, 0, 2, 3, 4]) == [0, 1, 2, 3, 4]
"""
# bin(x).count('1') counts the set bits for non-negative integers.
# For negative integers, we use the absolute value or a specific bit-width
# representation depending on the intended behavior, but based on the
# provided test cases, sorting by (bit_count, value) handles the logic.
return sorted(arr, key=lambda x: (bin(x).count('1') if x >= 0 else bin(x & 0xFFFFFFFF).count('1'), x))Reasoning trace
sandbox: passed=false returncode=1 timed_out=false
Traceback (most recent call last):
File "/tmp/tmpfg81g2i2/solution.py", line 51, in <module>
check(sort_array)
File "/tmp/tmpfg81g2i2/solution.py", line 48, in check
assertion(candidate(*inp), exp, 0)
File "/tmp/tmpfg81g2i2/solution.py", line 41, in assertion
assert exact_match
^^^^^^^^^^^
AssertionErrorExtracted:
fail · Gold: passby Stewart_of_Mars · shard 5 · 6/29/2026, 3:35:42 AM · cmqyo0o0y08ispk012o26vr1692.2%107/116 correct · 5 correct traces · 5 incorrect traces
by Stewart_of_Mars · shard 5 · 6/29/2026, 3:35:42 AM · cmqyo0o0y08ispk012o26vr16
92.2%
Correct samples
sample 1 · humaneval-plus:HumanEval/61pass · 100.0% · 7298ms · a41d7abf5673
Question
def correct_bracketing(brackets: str):
""" brackets is a string of "(" and ")".
return True if every opening bracket has a corresponding closing bracket.
>>> correct_bracketing("(")
False
>>> correct_bracketing("()")
True
>>> correct_bracketing("(()())")
True
>>> correct_bracketing(")(()")
False
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def correct_bracketing(brackets: str):
""" brackets is a string of "(" and ")".
return True if every opening bracket has a corresponding closing bracket.
>>> correct_bracketing("(")
False
>>> correct_bracketing("()")
True
>>> correct_bracketing("(()())")
True
>>> correct_bracketing(")(()")
False
"""
Model answer
def correct_bracketing(brackets: str):
""" brackets is a string of "(" and ")".
return True if every opening bracket has a corresponding closing bracket.
>>> correct_bracketing("(")
False
>>> correct_bracketing("()")
True
>>> correct_bracketing("(()())")
True
>>> correct_bracketing(")(()")
False
"""
balance = 0
for char in brackets:
if char == '(':
balance += 1
elif char == ')':
balance -= 1
if balance < 0:
return False
return balance == 0Reasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passsample 2 · humaneval-plus:HumanEval/122pass · 100.0% · 10574ms · b993751fffe7
Question
def add_elements(arr, k):
"""
Given a non-empty array of integers arr and an integer k, return
the sum of the elements with at most two digits from the first k elements of arr.
Example:
Input: arr = [111,21,3,4000,5,6,7,8,9], k = 4
Output: 24 # sum of 21 + 3
Constraints:
1. 1 <= len(arr) <= 100
2. 1 <= k <= len(arr)
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def add_elements(arr, k):
"""
Given a non-empty array of integers arr and an integer k, return
the sum of the elements with at most two digits from the first k elements of arr.
Example:
Input: arr = [111,21,3,4000,5,6,7,8,9], k = 4
Output: 24 # sum of 21 + 3
Constraints:
1. 1 <= len(arr) <= 100
2. 1 <= k <= len(arr)
"""
Model answer
def add_elements(arr, k):
"""
Given a non-empty array of integers arr and an integer k, return
the sum of the elements with at most two digits from the first k elements of arr.
Example:
Input: arr = [111,21,3,4000,5,6,7,8,9], k = 4
Output: 24 # sum of 21 + 3
Constraints:
1. 1 <= len(arr) <= 100
2. 1 <= k <= len(arr)
"""
total_sum = 0
for i in range(k):
# An integer has at most two digits if its absolute value is less than 100
if abs(arr[i]) < 100:
total_sum += arr[i]
return total_sumReasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passsample 3 · humaneval-plus:HumanEval/115pass · 100.0% · 11317ms · 7539c8e8b1fe
Question
import math
def max_fill(grid, capacity):
"""
You are given a rectangular grid of wells. Each row represents a single well,
and each 1 in a row represents a single unit of water.
Each well has a corresponding bucket that can be used to extract water from it,
and all buckets have the same capacity.
Your task is to use the buckets to empty the wells.
Output the number of times you need to lower the buckets.
Example 1:
Input:
grid : [[0,0,1,0], [0,1,0,0], [1,1,1,1]]
bucket_capacity : 1
Output: 6
Example 2:
Input:
grid : [[0,0,1,1], [0,0,0,0], [1,1,1,1], [0,1,1,1]]
bucket_capacity : 2
Output: 5
Example 3:
Input:
grid : [[0,0,0], [0,0,0]]
bucket_capacity : 5
Output: 0
Constraints:
* all wells have the same length
* 1 <= grid.length <= 10^2
* 1 <= grid[:,1].length <= 10^2
* grid[i][j] -> 0 | 1
* 1 <= capacity <= 10
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
import math
def max_fill(grid, capacity):
"""
You are given a rectangular grid of wells. Each row represents a single well,
and each 1 in a row represents a single unit of water.
Each well has a corresponding bucket that can be used to extract water from it,
and all buckets have the same capacity.
Your task is to use the buckets to empty the wells.
Output the number of times you need to lower the buckets.
Example 1:
Input:
grid : [[0,0,1,0], [0,1,0,0], [1,1,1,1]]
bucket_capacity : 1
Output: 6
Example 2:
Input:
grid : [[0,0,1,1], [0,0,0,0], [1,1,1,1], [0,1,1,1]]
bucket_capacity : 2
Output: 5
Example 3:
Input:
grid : [[0,0,0], [0,0,0]]
bucket_capacity : 5
Output: 0
Constraints:
* all wells have the same length
* 1 <= grid.length <= 10^2
* 1 <= grid[:,1].length <= 10^2
* grid[i][j] -> 0 | 1
* 1 <= capacity <= 10
"""
Model answer
import math
def max_fill(grid, capacity):
"""
You are given a rectangular grid of wells. Each row represents a single well,
and each 1 in a row represents a single unit of water.
Each well has a corresponding bucket that can be used to extract water from it,
and all buckets have the same capacity.
Your task is to use the buckets to empty the wells.
Output the number of times you need to lower the buckets.
"""
total_lowers = 0
for row in grid:
# Count the number of 1s in the current well (row)
water_units = sum(row)
if water_units > 0:
# Calculate how many times the bucket must be lowered for this specific well
# Using math.ceil to handle the remaining water that doesn't fill a whole bucket
total_lowers += math.ceil(water_units / capacity)
return total_lowersReasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passsample 4 · humaneval-plus:HumanEval/128pass · 100.0% · 12170ms · cc992acc2e74
Question
def prod_signs(arr):
"""
You are given an array arr of integers and you need to return
sum of magnitudes of integers multiplied by product of all signs
of each number in the array, represented by 1, -1 or 0.
Note: return None for empty arr.
Example:
>>> prod_signs([1, 2, 2, -4]) == -9
>>> prod_signs([0, 1]) == 0
>>> prod_signs([]) == None
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def prod_signs(arr):
"""
You are given an array arr of integers and you need to return
sum of magnitudes of integers multiplied by product of all signs
of each number in the array, represented by 1, -1 or 0.
Note: return None for empty arr.
Example:
>>> prod_signs([1, 2, 2, -4]) == -9
>>> prod_signs([0, 1]) == 0
>>> prod_signs([]) == None
"""
Model answer
def prod_signs(arr):
"""
You are given an array arr of integers and you need to return
sum of magnitudes of integers multiplied by product of all signs
of each number in the array, represented by 1, -1 or 0.
Note: return None for empty arr.
Example:
>>> prod_signs([1, 2, 2, -4]) == -9
>>> prod_signs([0, 1]) == 0
>>> prod_signs([]) == None
"""
if not arr:
return None
product_of_signs = 1
sum_of_magnitudes = 0
for x in arr:
sum_of_magnitudes += abs(x)
if x > 0:
product_of_signs *= 1
elif x < 0:
product_of_signs *= -1
else:
product_of_signs *= 0
return sum_of_magnitudes * product_of_signsReasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passsample 5 · humaneval-plus:HumanEval/138pass · 100.0% · 7381ms · d18483126cd8
Question
def is_equal_to_sum_even(n):
"""Evaluate whether the given number n can be written as the sum of exactly 4 positive even numbers
Example
is_equal_to_sum_even(4) == False
is_equal_to_sum_even(6) == False
is_equal_to_sum_even(8) == True
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def is_equal_to_sum_even(n):
"""Evaluate whether the given number n can be written as the sum of exactly 4 positive even numbers
Example
is_equal_to_sum_even(4) == False
is_equal_to_sum_even(6) == False
is_equal_to_sum_even(8) == True
"""
Model answer
def is_equal_to_sum_even(n):
"""Evaluate whether the given number n can be written as the sum of exactly 4 positive even numbers
Example
is_equal_to_sum_even(4) == False
is_equal_to_sum_even(6) == False
is_equal_to_sum_even(8) == True
"""
# The smallest sum of 4 positive even numbers is 2 + 2 + 2 + 2 = 8.
# Any sum of even numbers must be even.
return n >= 8 and n % 2 == 0Reasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passIncorrect samples
sample 20 · humaneval-plus:HumanEval/32fail · 0.0% · 25867ms · 250cd50d8dd1
Question
import math
def poly(xs: list, x: float):
"""
Evaluates polynomial with coefficients xs at point x.
return xs[0] + xs[1] * x + xs[1] * x^2 + .... xs[n] * x^n
"""
return sum([coeff * math.pow(x, i) for i, coeff in enumerate(xs)])
def find_zero(xs: list):
""" xs are coefficients of a polynomial.
find_zero find x such that poly(x) = 0.
find_zero returns only only zero point, even if there are many.
Moreover, find_zero only takes list xs having even number of coefficients
and largest non zero coefficient as it guarantees
a solution.
>>> round(find_zero([1, 2]), 2) # f(x) = 1 + 2x
-0.5
>>> round(find_zero([-6, 11, -6, 1]), 2) # (x - 1) * (x - 2) * (x - 3) = -6 + 11x - 6x^2 + x^3
1.0
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
import math
def poly(xs: list, x: float):
"""
Evaluates polynomial with coefficients xs at point x.
return xs[0] + xs[1] * x + xs[1] * x^2 + .... xs[n] * x^n
"""
return sum([coeff * math.pow(x, i) for i, coeff in enumerate(xs)])
def find_zero(xs: list):
""" xs are coefficients of a polynomial.
find_zero find x such that poly(x) = 0.
find_zero returns only only zero point, even if there are many.
Moreover, find_zero only takes list xs having even number of coefficients
and largest non zero coefficient as it guarantees
a solution.
>>> round(find_zero([1, 2]), 2) # f(x) = 1 + 2x
-0.5
>>> round(find_zero([-6, 11, -6, 1]), 2) # (x - 1) * (x - 2) * (x - 3) = -6 + 11x - 6x^2 + x^3
1.0
"""
Model answer
import math
def poly(xs: list, x: float):
"""
Evaluates polynomial with coefficients xs at point x.
return xs[0] + xs[1] * x + xs[1] * x^2 + .... xs[n] * x^n
"""
return sum([coeff * math.pow(x, i) for i, coeff in enumerate(xs)])
def find_zero(xs: list):
""" xs are coefficients of a polynomial.
find_zero find x such that poly(x) = 0.
find_zero returns only only zero point, even if there are many.
Moreover, find_zero only takes list xs having even number of coefficients
and largest non zero coefficient as it guarantees
a solution.
>>> round(find_zero([1, 2]), 2) # f(x) = 1 + 2x
-0.5
>>> round(find_zero([-6, 11, -6, 1]), 2) # (x - 1) * (x - 2) * (x - 3) = -6 + 11x - 6x^2 + x^3
1.0
"""
# Bisection method
# Since the polynomial has an odd degree (even number of coefficients),
# it must have at least one real root.
# Find a range [a, b] where the function changes sign
# Cauchy's bound for roots: |x| <= 1 + max(|a_i| / |a_n|)
a_n = xs[-1]
max_coeff = max(abs(c) for c in xs)
bound = 1 + max_coeff / abs(a_n)
low = -bound
high = bound
# Ensure we have a sign change
if poly(xs, low) * poly(xs, high) > 0:
# This case shouldn't happen for odd degree polynomials with the Cauchy bound,
# but we can expand the search if necessary.
pass
# Bisection loop
for _ in range(100):
mid = (low + high) / 2
if abs(poly(xs, mid)) < 1e-12:
return mid
if poly(xs, low) * poly(xs, mid) < 0:
high = mid
else:
low = mid
return (low + high) / 2Reasoning trace
sandbox: passed=false returncode=1 timed_out=false
Traceback (most recent call last):
File "/tmp/tmp4b8crzny/solution.py", line 74, in <module>
check(find_zero)
File "/tmp/tmp4b8crzny/solution.py", line 71, in check
assert _poly(*candidate(*inp), inp) <= 0.0001
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Value after * must be an iterable, not floatExtracted:
fail · Gold: passsample 35 · humaneval-plus:HumanEval/132fail · 0.0% · 8147ms · d65ed04e2d42
Question
def is_nested(string):
'''
Create a function that takes a string as input which contains only square brackets.
The function should return True if and only if there is a valid subsequence of brackets
where at least one bracket in the subsequence is nested.
is_nested('[[]]') ➞ True
is_nested('[]]]]]]][[[[[]') ➞ False
is_nested('[][]') ➞ False
is_nested('[]') ➞ False
is_nested('[[][]]') ➞ True
is_nested('[[]][[') ➞ True
'''Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def is_nested(string):
'''
Create a function that takes a string as input which contains only square brackets.
The function should return True if and only if there is a valid subsequence of brackets
where at least one bracket in the subsequence is nested.
is_nested('[[]]') ➞ True
is_nested('[]]]]]]][[[[[]') ➞ False
is_nested('[][]') ➞ False
is_nested('[]') ➞ False
is_nested('[[][]]') ➞ True
is_nested('[[]][[') ➞ True
'''
Model answer
def is_nested(string):
'''
Create a function that takes a string as input which contains only square brackets.
The function should return True if and only if there is a valid subsequence of brackets
where at least one bracket in the subsequence is nested.
'''
stack = []
for char in string:
if char == '[':
stack.append(char)
elif char == ']':
if stack:
stack.pop()
# If we pop a bracket and there is still at least one '['
# remaining in the stack, it means the bracket we just
# closed was nested inside another.
if stack:
return True
return FalseReasoning trace
sandbox: passed=false returncode=1 timed_out=false
Traceback (most recent call last):
File "/tmp/tmp77xdmmhb/solution.py", line 54, in <module>
check(is_nested)
File "/tmp/tmp77xdmmhb/solution.py", line 51, in check
assertion(candidate(*inp), exp, 0)
File "/tmp/tmp77xdmmhb/solution.py", line 44, in assertion
assert exact_match
^^^^^^^^^^^
AssertionErrorExtracted:
fail · Gold: passsample 60 · humaneval-plus:HumanEval/163fail · 0.0% · 30435ms · 46cd1b674899
Question
def generate_integers(a, b):
"""
Given two positive integers a and b, return the even digits between a
and b, in ascending order.
For example:
generate_integers(2, 8) => [2, 4, 6, 8]
generate_integers(8, 2) => [2, 4, 6, 8]
generate_integers(10, 14) => []
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def generate_integers(a, b):
"""
Given two positive integers a and b, return the even digits between a
and b, in ascending order.
For example:
generate_integers(2, 8) => [2, 4, 6, 8]
generate_integers(8, 2) => [2, 4, 6, 8]
generate_integers(10, 14) => []
"""
Model answer
def generate_integers(a, b):
"""
Given two positive integers a and b, return the even digits between a
and b, in ascending order.
For example:
generate_integers(2, 8) => [2, 4, 6, 8]
generate_integers(8, 2) => [2, 4, 6, 8]
generate_integers(10, 14) => []
"""
start = min(a, b)
end = max(a, b)
# The problem specifies "even digits", implying numbers 0-9.
# We filter for numbers that are even and are single digits.
result = [i for i in range(start, end + 1) if i % 2 == 0 and 0 <= i <= 9]
return resultReasoning trace
sandbox: passed=false returncode=-24 timed_out=false
Extracted:
fail · Gold: passsample 75 · humaneval-plus:HumanEval/103fail · 0.0% · 17623ms · 9fac21aafd37
Question
def rounded_avg(n, m):
"""You are given two positive integers n and m, and your task is to compute the
average of the integers from n through m (including n and m).
Round the answer to the nearest integer and convert that to binary.
If n is greater than m, return -1.
Example:
rounded_avg(1, 5) => "0b11"
rounded_avg(7, 5) => -1
rounded_avg(10, 20) => "0b1111"
rounded_avg(20, 33) => "0b11010"
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def rounded_avg(n, m):
"""You are given two positive integers n and m, and your task is to compute the
average of the integers from n through m (including n and m).
Round the answer to the nearest integer and convert that to binary.
If n is greater than m, return -1.
Example:
rounded_avg(1, 5) => "0b11"
rounded_avg(7, 5) => -1
rounded_avg(10, 20) => "0b1111"
rounded_avg(20, 33) => "0b11010"
"""
Model answer
def rounded_avg(n, m):
"""You are given two positive integers n and m, and your task is to compute the
average of the integers from n through m (including n and m).
Round the answer to the nearest integer and convert that to binary.
If n is greater than m, return -1.
Example:
rounded_avg(1, 5) => "0b11"
rounded_avg(7, 5) => -1
rounded_avg(10, 20) => "0b1111"
rounded_avg(20, 33) => "0b11010"
"""
if n > m:
return -1
# The sum of integers from n to m is (n + m) * (m - n + 1) / 2
# The average is (n + m) / 2
avg = (n + m) / 2
# Round to nearest integer. Python's round() rounds to the nearest even number
# for .5 cases, but standard mathematical rounding for this problem
# usually implies rounding .5 up. However, for (n+m)/2,
# int(avg + 0.5) handles the standard "round half up" logic.
rounded_val = int(avg + 0.5) if avg % 1 == 0.5 else round(avg)
# To be safe and consistent with common competitive programming expectations
# for "nearest integer" where .5 goes up:
import math
rounded_val = math.floor(avg + 0.5)
return bin(rounded_val)Reasoning trace
sandbox: passed=false returncode=1 timed_out=false
Traceback (most recent call last):
File "/tmp/tmpzx7jez24/solution.py", line 65, in <module>
check(rounded_avg)
File "/tmp/tmpzx7jez24/solution.py", line 62, in check
assertion(candidate(*inp), exp, 0)
File "/tmp/tmpzx7jez24/solution.py", line 55, in assertion
assert exact_match
^^^^^^^^^^^
AssertionErrorExtracted:
fail · Gold: passsample 79 · humaneval-plus:HumanEval/101fail · 0.0% · 7811ms · 494a4417729c
Question
def words_string(s):
"""
You will be given a string of words separated by commas or spaces. Your task is
to split the string into words and return an array of the words.
For example:
words_string("Hi, my name is John") == ["Hi", "my", "name", "is", "John"]
words_string("One, two, three, four, five, six") == ["One", "two", "three", "four", "five", "six"]
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def words_string(s):
"""
You will be given a string of words separated by commas or spaces. Your task is
to split the string into words and return an array of the words.
For example:
words_string("Hi, my name is John") == ["Hi", "my", "name", "is", "John"]
words_string("One, two, three, four, five, six") == ["One", "two", "three", "four", "five", "six"]
"""
Model answer
import re
def words_string(s):
"""
You will be given a string of words separated by commas or spaces. Your task is
to split the string into words and return an array of the words.
For example:
words_string("Hi, my name is John") == ["Hi", "my", "name", "is", "John"]
words_string("One, two, three, four, five, six") == ["One", "two", "three", "four", "five", "six"]
"""
return re.split(r'[,\s]+', s.strip(' ,')) if s.strip(' ,') else []Reasoning trace
sandbox: passed=false returncode=1 timed_out=false
Traceback (most recent call last):
File "/tmp/tmpd3r69tvw/solution.py", line 47, in <module>
check(words_string)
File "/tmp/tmpd3r69tvw/solution.py", line 44, in check
assertion(candidate(*inp), exp, 0)
File "/tmp/tmpd3r69tvw/solution.py", line 37, in assertion
assert exact_match
^^^^^^^^^^^
AssertionErrorExtracted:
fail · Gold: passby Stewart_of_Mars · shard 4 · 6/29/2026, 3:33:49 AM · cmqyny8a408f3pk013sq4q8vo93.1%108/116 correct · 5 correct traces · 5 incorrect traces
by Stewart_of_Mars · shard 4 · 6/29/2026, 3:33:49 AM · cmqyny8a408f3pk013sq4q8vo
93.1%
Correct samples
sample 1 · humaneval-plus:HumanEval/16pass · 100.0% · 4242ms · 199b00c3653a
Question
def count_distinct_characters(string: str) -> int:
""" Given a string, find out how many distinct characters (regardless of case) does it consist of
>>> count_distinct_characters('xyzXYZ')
3
>>> count_distinct_characters('Jerry')
4
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def count_distinct_characters(string: str) -> int:
""" Given a string, find out how many distinct characters (regardless of case) does it consist of
>>> count_distinct_characters('xyzXYZ')
3
>>> count_distinct_characters('Jerry')
4
"""
Model answer
def count_distinct_characters(string: str) -> int:
""" Given a string, find out how many distinct characters (regardless of case) does it consist of
>>> count_distinct_characters('xyzXYZ')
3
>>> count_distinct_characters('Jerry')
4
"""
return len(set(string.lower()))Reasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passsample 2 · humaneval-plus:HumanEval/155pass · 100.0% · 6955ms · 803353390d0c
Question
def even_odd_count(num):
"""Given an integer. return a tuple that has the number of even and odd digits respectively.
Example:
even_odd_count(-12) ==> (1, 1)
even_odd_count(123) ==> (1, 2)
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def even_odd_count(num):
"""Given an integer. return a tuple that has the number of even and odd digits respectively.
Example:
even_odd_count(-12) ==> (1, 1)
even_odd_count(123) ==> (1, 2)
"""
Model answer
def even_odd_count(num):
"""Given an integer. return a tuple that has the number of even and odd digits respectively.
Example:
even_odd_count(-12) ==> (1, 1)
even_odd_count(123) ==> (1, 2)
"""
evens = 0
odds = 0
for digit in str(abs(num)):
if int(digit) % 2 == 0:
evens += 1
else:
odds += 1
return (evens, odds)Reasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passsample 3 · humaneval-plus:HumanEval/128pass · 100.0% · 12025ms · cc992acc2e74
Question
def prod_signs(arr):
"""
You are given an array arr of integers and you need to return
sum of magnitudes of integers multiplied by product of all signs
of each number in the array, represented by 1, -1 or 0.
Note: return None for empty arr.
Example:
>>> prod_signs([1, 2, 2, -4]) == -9
>>> prod_signs([0, 1]) == 0
>>> prod_signs([]) == None
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def prod_signs(arr):
"""
You are given an array arr of integers and you need to return
sum of magnitudes of integers multiplied by product of all signs
of each number in the array, represented by 1, -1 or 0.
Note: return None for empty arr.
Example:
>>> prod_signs([1, 2, 2, -4]) == -9
>>> prod_signs([0, 1]) == 0
>>> prod_signs([]) == None
"""
Model answer
def prod_signs(arr):
"""
You are given an array arr of integers and you need to return
sum of magnitudes of integers multiplied by product of all signs
of each number in the array, represented by 1, -1 or 0.
Note: return None for empty arr.
Example:
>>> prod_signs([1, 2, 2, -4]) == -9
>>> prod_signs([0, 1]) == 0
>>> prod_signs([]) == None
"""
if not arr:
return None
product_of_signs = 1
sum_of_magnitudes = 0
for x in arr:
sum_of_magnitudes += abs(x)
if x > 0:
product_of_signs *= 1
elif x < 0:
product_of_signs *= -1
else:
product_of_signs *= 0
return sum_of_magnitudes * product_of_signsReasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passsample 4 · humaneval-plus:HumanEval/115pass · 100.0% · 10325ms · 7539c8e8b1fe
Question
import math
def max_fill(grid, capacity):
"""
You are given a rectangular grid of wells. Each row represents a single well,
and each 1 in a row represents a single unit of water.
Each well has a corresponding bucket that can be used to extract water from it,
and all buckets have the same capacity.
Your task is to use the buckets to empty the wells.
Output the number of times you need to lower the buckets.
Example 1:
Input:
grid : [[0,0,1,0], [0,1,0,0], [1,1,1,1]]
bucket_capacity : 1
Output: 6
Example 2:
Input:
grid : [[0,0,1,1], [0,0,0,0], [1,1,1,1], [0,1,1,1]]
bucket_capacity : 2
Output: 5
Example 3:
Input:
grid : [[0,0,0], [0,0,0]]
bucket_capacity : 5
Output: 0
Constraints:
* all wells have the same length
* 1 <= grid.length <= 10^2
* 1 <= grid[:,1].length <= 10^2
* grid[i][j] -> 0 | 1
* 1 <= capacity <= 10
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
import math
def max_fill(grid, capacity):
"""
You are given a rectangular grid of wells. Each row represents a single well,
and each 1 in a row represents a single unit of water.
Each well has a corresponding bucket that can be used to extract water from it,
and all buckets have the same capacity.
Your task is to use the buckets to empty the wells.
Output the number of times you need to lower the buckets.
Example 1:
Input:
grid : [[0,0,1,0], [0,1,0,0], [1,1,1,1]]
bucket_capacity : 1
Output: 6
Example 2:
Input:
grid : [[0,0,1,1], [0,0,0,0], [1,1,1,1], [0,1,1,1]]
bucket_capacity : 2
Output: 5
Example 3:
Input:
grid : [[0,0,0], [0,0,0]]
bucket_capacity : 5
Output: 0
Constraints:
* all wells have the same length
* 1 <= grid.length <= 10^2
* 1 <= grid[:,1].length <= 10^2
* grid[i][j] -> 0 | 1
* 1 <= capacity <= 10
"""
Model answer
import math
def max_fill(grid, capacity):
"""
You are given a rectangular grid of wells. Each row represents a single well,
and each 1 in a row represents a single unit of water.
Each well has a corresponding bucket that can be used to extract water from it,
and all buckets have the same capacity.
Your task is to use the buckets to empty the wells.
Output the number of times you need to lower the buckets.
"""
total_lowers = 0
for row in grid:
water_in_well = sum(row)
if water_in_well > 0:
# Calculate how many times the bucket must be lowered for this specific well
# using ceiling division: (water + capacity - 1) // capacity
total_lowers += math.ceil(water_in_well / capacity)
return total_lowersReasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passsample 5 · humaneval-plus:HumanEval/60pass · 100.0% · 6611ms · 1933719e1040
Question
def sum_to_n(n: int):
"""sum_to_n is a function that sums numbers from 1 to n.
>>> sum_to_n(30)
465
>>> sum_to_n(100)
5050
>>> sum_to_n(5)
15
>>> sum_to_n(10)
55
>>> sum_to_n(1)
1
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def sum_to_n(n: int):
"""sum_to_n is a function that sums numbers from 1 to n.
>>> sum_to_n(30)
465
>>> sum_to_n(100)
5050
>>> sum_to_n(5)
15
>>> sum_to_n(10)
55
>>> sum_to_n(1)
1
"""
Model answer
def sum_to_n(n: int):
"""sum_to_n is a function that sums numbers from 1 to n.
>>> sum_to_n(30)
465
>>> sum_to_n(100)
5050
>>> sum_to_n(5)
15
>>> sum_to_n(10)
55
>>> sum_to_n(1)
1
"""
return n * (n + 1) // 2Reasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passIncorrect samples
sample 10 · humaneval-plus:HumanEval/99fail · 0.0% · 18521ms · 125aa68cf8ce
Question
def closest_integer(value):
'''
Create a function that takes a value (string) representing a number
and returns the closest integer to it. If the number is equidistant
from two integers, round it away from zero.
Examples
>>> closest_integer("10")
10
>>> closest_integer("15.3")
15
Note:
Rounding away from zero means that if the given number is equidistant
from two integers, the one you should return is the one that is the
farthest from zero. For example closest_integer("14.5") should
return 15 and closest_integer("-14.5") should return -15.
'''Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def closest_integer(value):
'''
Create a function that takes a value (string) representing a number
and returns the closest integer to it. If the number is equidistant
from two integers, round it away from zero.
Examples
>>> closest_integer("10")
10
>>> closest_integer("15.3")
15
Note:
Rounding away from zero means that if the given number is equidistant
from two integers, the one you should return is the one that is the
farthest from zero. For example closest_integer("14.5") should
return 15 and closest_integer("-14.5") should return -15.
'''
Model answer
import decimal
def closest_integer(value):
'''
Create a function that takes a value (string) representing a number
and returns the closest integer to it. If the number is equidistant
from two integers, round it away from zero.
Examples
>>> closest_integer("10")
10
>>> closest_integer("15.3")
15
Note:
Rounding away from zero means that if the given number is equidistant
from two integers, the one you should return is the one that is the
farthest from zero. For example closest_integer("14.5") should
return 15 and closest_integer("-14.5") should return -15.
'''
# Use decimal module to handle precision and specific rounding modes
# ROUND_HALF_UP rounds towards the nearest neighbor, with ties going away from zero
# for positive numbers. For negative numbers, we handle the sign to ensure
# it always moves away from zero.
d = decimal.Decimal(value)
rounded = d.quantize(decimal.Decimal('1'), rounding=decimal.ROUND_HALF_UP)
# decimal.ROUND_HALF_UP rounds 14.5 to 15 and -14.5 to -14.
# To achieve "away from zero" for negatives, we can use ROUND_HALF_UP on the absolute value.
abs_d = abs(d)
rounded_abs = abs_d.quantize(decimal.Decimal('1'), rounding=decimal.ROUND_HALF_UP)
return int(rounded_abs) if d >= 0 else -int(rounded_abs)Reasoning trace
sandbox: passed=false returncode=1 timed_out=false
Traceback (most recent call last):
File "/tmp/tmp6q9rz_xl/solution.py", line 70, in <module>
check(closest_integer)
File "/tmp/tmp6q9rz_xl/solution.py", line 67, in check
assertion(candidate(*inp), exp, 0)
File "/tmp/tmp6q9rz_xl/solution.py", line 60, in assertion
assert exact_match
^^^^^^^^^^^
AssertionErrorExtracted:
fail · Gold: passsample 13 · humaneval-plus:HumanEval/91fail · 0.0% · 13469ms · 57147f23be53
Question
def is_bored(S):
"""
You'll be given a string of words, and your task is to count the number
of boredoms. A boredom is a sentence that starts with the word "I".
Sentences are delimited by '.', '?' or '!'.
For example:
>>> is_bored("Hello world")
0
>>> is_bored("The sky is blue. The sun is shining. I love this weather")
1
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def is_bored(S):
"""
You'll be given a string of words, and your task is to count the number
of boredoms. A boredom is a sentence that starts with the word "I".
Sentences are delimited by '.', '?' or '!'.
For example:
>>> is_bored("Hello world")
0
>>> is_bored("The sky is blue. The sun is shining. I love this weather")
1
"""
Model answer
def is_bored(S):
"""
You'll be given a string of words, and your task is to count the number
of boredoms. A boredom is a sentence that starts with the word "I".
Sentences are delimited by '.', '?' or '!'.
For example:
>>> is_bored("Hello world")
0
>>> is_bored("The sky is blue. The sun is shining. I love this weather")
1
"""
import re
# Split the string by any of the delimiters '.', '?', or '!'
sentences = re.split(r'[.?!]', S)
count = 0
for sentence in sentences:
# Strip leading whitespace and check if the sentence starts with "I"
# We check if it starts with "I " or is exactly "I" to ensure it's the word "I"
stripped = sentence.strip()
if stripped.startswith("I"):
# Ensure "I" is a standalone word (followed by space or end of string)
if len(stripped) == 1 or stripped[1] == ' ':
count += 1
return countReasoning trace
sandbox: passed=false returncode=1 timed_out=false
Traceback (most recent call last):
File "/tmp/tmpnmwcpz43/solution.py", line 60, in <module>
check(is_bored)
File "/tmp/tmpnmwcpz43/solution.py", line 57, in check
assertion(candidate(*inp), exp, 0)
File "/tmp/tmpnmwcpz43/solution.py", line 50, in assertion
assert exact_match
^^^^^^^^^^^
AssertionErrorExtracted:
fail · Gold: passsample 29 · humaneval-plus:HumanEval/76fail · 0.0% · 11152ms · baea024f4bf6
Question
def is_simple_power(x, n):
"""Your task is to write a function that returns true if a number x is a simple
power of n and false in other cases.
x is a simple power of n if n**int=x
For example:
is_simple_power(1, 4) => true
is_simple_power(2, 2) => true
is_simple_power(8, 2) => true
is_simple_power(3, 2) => false
is_simple_power(3, 1) => false
is_simple_power(5, 3) => false
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def is_simple_power(x, n):
"""Your task is to write a function that returns true if a number x is a simple
power of n and false in other cases.
x is a simple power of n if n**int=x
For example:
is_simple_power(1, 4) => true
is_simple_power(2, 2) => true
is_simple_power(8, 2) => true
is_simple_power(3, 2) => false
is_simple_power(3, 1) => false
is_simple_power(5, 3) => false
"""
Model answer
def is_simple_power(x, n):
"""Your task is to write a function that returns true if a number x is a simple
power of n and false in other cases.
x is a simple power of n if n**int=x
For example:
is_simple_power(1, 4) => true
is_simple_power(2, 2) => true
is_simple_power(8, 2) => true
is_simple_power(3, 2) => false
is_simple_power(3, 1) => false
is_simple_power(5, 3) => false
"""
if x == 1:
return True
if n == 1:
return x == 1
if n == 0:
return x == 0
power = 1
while power < x:
power *= n
return power == xReasoning trace
sandbox: passed=false returncode=1 timed_out=false
Traceback (most recent call last):
File "/tmp/tmp10nsm9nt/solution.py", line 59, in <module>
check(is_simple_power)
File "/tmp/tmp10nsm9nt/solution.py", line 56, in check
assertion(candidate(*inp), exp, 0)
File "/tmp/tmp10nsm9nt/solution.py", line 49, in assertion
assert exact_match
^^^^^^^^^^^
AssertionErrorExtracted:
fail · Gold: passsample 39 · humaneval-plus:HumanEval/132fail · 0.0% · 7699ms · d65ed04e2d42
Question
def is_nested(string):
'''
Create a function that takes a string as input which contains only square brackets.
The function should return True if and only if there is a valid subsequence of brackets
where at least one bracket in the subsequence is nested.
is_nested('[[]]') ➞ True
is_nested('[]]]]]]][[[[[]') ➞ False
is_nested('[][]') ➞ False
is_nested('[]') ➞ False
is_nested('[[][]]') ➞ True
is_nested('[[]][[') ➞ True
'''Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def is_nested(string):
'''
Create a function that takes a string as input which contains only square brackets.
The function should return True if and only if there is a valid subsequence of brackets
where at least one bracket in the subsequence is nested.
is_nested('[[]]') ➞ True
is_nested('[]]]]]]][[[[[]') ➞ False
is_nested('[][]') ➞ False
is_nested('[]') ➞ False
is_nested('[[][]]') ➞ True
is_nested('[[]][[') ➞ True
'''
Model answer
def is_nested(string):
'''
Create a function that takes a string as input which contains only square brackets.
The function should return True if and only if there is a valid subsequence of brackets
where at least one bracket in the subsequence is nested.
'''
stack = []
for char in string:
if char == '[':
stack.append(char)
elif char == ']':
if stack:
stack.pop()
# If we just closed a bracket and there is still another
# open bracket in the stack, it means the closed one was nested.
if stack:
return True
return FalseReasoning trace
sandbox: passed=false returncode=1 timed_out=false
Traceback (most recent call last):
File "/tmp/tmpwxaf6txk/solution.py", line 53, in <module>
check(is_nested)
File "/tmp/tmpwxaf6txk/solution.py", line 50, in check
assertion(candidate(*inp), exp, 0)
File "/tmp/tmpwxaf6txk/solution.py", line 43, in assertion
assert exact_match
^^^^^^^^^^^
AssertionErrorExtracted:
fail · Gold: passsample 40 · humaneval-plus:HumanEval/145fail · 0.0% · 12674ms · d5a81ec8d1f0
Question
def order_by_points(nums):
"""
Write a function which sorts the given list of integers
in ascending order according to the sum of their digits.
Note: if there are several items with similar sum of their digits,
order them based on their index in original list.
For example:
>>> order_by_points([1, 11, -1, -11, -12]) == [-1, -11, 1, -12, 11]
>>> order_by_points([]) == []
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def order_by_points(nums):
"""
Write a function which sorts the given list of integers
in ascending order according to the sum of their digits.
Note: if there are several items with similar sum of their digits,
order them based on their index in original list.
For example:
>>> order_by_points([1, 11, -1, -11, -12]) == [-1, -11, 1, -12, 11]
>>> order_by_points([]) == []
"""
Model answer
def order_by_points(nums):
"""
Write a function which sorts the given list of integers
in ascending order according to the sum of their digits.
Note: if there are several items with similar sum of their digits,
order them based on their index in original list.
For example:
>>> order_by_points([1, 11, -1, -11, -12]) == [-1, -11, 1, -12, 11]
>>> order_by_points([]) == []
"""
def get_digit_sum(n):
# Use absolute value to handle negative numbers as per the example
# Example: -11 -> 1+1 = 2; -12 -> 1+2 = 3
return sum(int(digit) for digit in str(abs(n)))
# Python's sort is stable, so we only need to sort by the digit sum.
# The original relative order (index) will be preserved for equal keys.
return sorted(nums, key=get_digit_sum)Reasoning trace
sandbox: passed=false returncode=1 timed_out=false
Traceback (most recent call last):
File "/tmp/tmp5n2g8atc/solution.py", line 54, in <module>
check(order_by_points)
File "/tmp/tmp5n2g8atc/solution.py", line 51, in check
assertion(candidate(*inp), exp, 0)
File "/tmp/tmp5n2g8atc/solution.py", line 44, in assertion
assert exact_match
^^^^^^^^^^^
AssertionErrorExtracted:
fail · Gold: passby Stewart_of_Mars · shard 3 · 6/29/2026, 3:32:18 AM · cmqynwadu08bhpk01gpnq6pie94.0%109/116 correct · 5 correct traces · 5 incorrect traces
by Stewart_of_Mars · shard 3 · 6/29/2026, 3:32:18 AM · cmqynwadu08bhpk01gpnq6pie
94.0%
Correct samples
sample 1 · humaneval-plus:HumanEval/126pass · 100.0% · 17836ms · 17f6d5efd852
Question
def is_sorted(lst):
'''
Given a list of numbers, return whether or not they are sorted
in ascending order. If list has more than 1 duplicate of the same
number, return False. Assume no negative numbers and only integers.
Examples
is_sorted([5]) ➞ True
is_sorted([1, 2, 3, 4, 5]) ➞ True
is_sorted([1, 3, 2, 4, 5]) ➞ False
is_sorted([1, 2, 3, 4, 5, 6]) ➞ True
is_sorted([1, 2, 3, 4, 5, 6, 7]) ➞ True
is_sorted([1, 3, 2, 4, 5, 6, 7]) ➞ False
is_sorted([1, 2, 2, 3, 3, 4]) ➞ True
is_sorted([1, 2, 2, 2, 3, 4]) ➞ False
'''Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def is_sorted(lst):
'''
Given a list of numbers, return whether or not they are sorted
in ascending order. If list has more than 1 duplicate of the same
number, return False. Assume no negative numbers and only integers.
Examples
is_sorted([5]) ➞ True
is_sorted([1, 2, 3, 4, 5]) ➞ True
is_sorted([1, 3, 2, 4, 5]) ➞ False
is_sorted([1, 2, 3, 4, 5, 6]) ➞ True
is_sorted([1, 2, 3, 4, 5, 6, 7]) ➞ True
is_sorted([1, 3, 2, 4, 5, 6, 7]) ➞ False
is_sorted([1, 2, 2, 3, 3, 4]) ➞ True
is_sorted([1, 2, 2, 2, 3, 4]) ➞ False
'''
Model answer
def is_sorted(lst):
'''
Given a list of numbers, return whether or not they are sorted
in ascending order. If list has more than 1 duplicate of the same
number, return False. Assume no negative numbers and only integers.
Examples
is_sorted([5]) ➞ True
is_sorted([1, 2, 3, 4, 5]) ➞ True
is_sorted([1, 3, 2, 4, 5]) ➞ False
is_sorted([1, 2, 3, 4, 5, 6]) ➞ True
is_sorted([1, 2, 3, 4, 5, 6, 7]) ➞ True
is_sorted([1, 3, 2, 4, 5, 6, 7]) ➞ False
is_sorted([1, 2, 2, 3, 3, 4]) ➞ True
is_sorted([1, 2, 2, 2, 3, 4]) ➞ False
'''
# Check if sorted in ascending order
for i in range(len(lst) - 1):
if lst[i] > lst[i + 1]:
return False
# Check if any number appears more than twice
counts = {}
for num in lst:
counts[num] = counts.get(num, 0) + 1
if counts[num] > 2:
return False
return TrueReasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passsample 2 · humaneval-plus:HumanEval/140pass · 100.0% · 12698ms · 7bec51cd74df
Question
def fix_spaces(text):
"""
Given a string text, replace all spaces in it with underscores,
and if a string has more than 2 consecutive spaces,
then replace all consecutive spaces with -
fix_spaces("Example") == "Example"
fix_spaces("Example 1") == "Example_1"
fix_spaces(" Example 2") == "_Example_2"
fix_spaces(" Example 3") == "_Example-3"
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def fix_spaces(text):
"""
Given a string text, replace all spaces in it with underscores,
and if a string has more than 2 consecutive spaces,
then replace all consecutive spaces with -
fix_spaces("Example") == "Example"
fix_spaces("Example 1") == "Example_1"
fix_spaces(" Example 2") == "_Example_2"
fix_spaces(" Example 3") == "_Example-3"
"""
Model answer
import re
def fix_spaces(text):
"""
Given a string text, replace all spaces in it with underscores,
and if a string has more than 2 consecutive spaces,
then replace all consecutive spaces with -
fix_spaces("Example") == "Example"
fix_spaces("Example 1") == "Example_1"
fix_spaces(" Example 2") == "_Example_2"
fix_spaces(" Example 3") == "_Example-3"
"""
# Replace 3 or more spaces with '-'
text = re.sub(r' {3,}', '-', text)
# Replace remaining single or double spaces with '_'
text = text.replace(' ', '_')
# If double spaces were replaced by '__', we need to ensure
# the logic follows the prompt's specific examples.
# The prompt says "more than 2 consecutive spaces" -> '-',
# otherwise "replace all spaces" -> '_'.
# This means 1 space -> '_', 2 spaces -> '__', 3+ spaces -> '-'
return textReasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passsample 3 · humaneval-plus:HumanEval/133pass · 100.0% · 9169ms · 2f1345202416
Question
def sum_squares(lst):
"""You are given a list of numbers.
You need to return the sum of squared numbers in the given list,
round each element in the list to the upper int(Ceiling) first.
Examples:
For lst = [1,2,3] the output should be 14
For lst = [1,4,9] the output should be 98
For lst = [1,3,5,7] the output should be 84
For lst = [1.4,4.2,0] the output should be 29
For lst = [-2.4,1,1] the output should be 6
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def sum_squares(lst):
"""You are given a list of numbers.
You need to return the sum of squared numbers in the given list,
round each element in the list to the upper int(Ceiling) first.
Examples:
For lst = [1,2,3] the output should be 14
For lst = [1,4,9] the output should be 98
For lst = [1,3,5,7] the output should be 84
For lst = [1.4,4.2,0] the output should be 29
For lst = [-2.4,1,1] the output should be 6
"""
Model answer
import math
def sum_squares(lst):
"""You are given a list of numbers.
You need to return the sum of squared numbers in the given list,
round each element in the list to the upper int(Ceiling) first.
Examples:
For lst = [1,2,3] the output should be 14
For lst = [1,4,9] the output should be 98
For lst = [1,3,5,7] the output should be 84
For lst = [1.4,4.2,0] the output should be 29
For lst = [-2.4,1,1] the output should be 6
"""
return sum(math.ceil(x)**2 for x in lst)Reasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passsample 5 · humaneval-plus:HumanEval/57pass · 100.0% · 7365ms · 139c9b8b06e8
Question
def monotonic(l: list):
"""Return True is list elements are monotonically increasing or decreasing.
>>> monotonic([1, 2, 4, 20])
True
>>> monotonic([1, 20, 4, 10])
False
>>> monotonic([4, 1, 0, -10])
True
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def monotonic(l: list):
"""Return True is list elements are monotonically increasing or decreasing.
>>> monotonic([1, 2, 4, 20])
True
>>> monotonic([1, 20, 4, 10])
False
>>> monotonic([4, 1, 0, -10])
True
"""
Model answer
def monotonic(l: list):
"""Return True is list elements are monotonically increasing or decreasing.
>>> monotonic([1, 2, 4, 20])
True
>>> monotonic([1, 20, 4, 10])
False
>>> monotonic([4, 1, 0, -10])
True
"""
return (all(l[i] <= l[i + 1] for i in range(len(l) - 1)) or
all(l[i] >= l[i + 1] for i in range(len(l) - 1)))Reasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passsample 6 · humaneval-plus:HumanEval/35pass · 100.0% · 5418ms · 3dcc6f882e29
Question
def max_element(l: list):
"""Return maximum element in the list.
>>> max_element([1, 2, 3])
3
>>> max_element([5, 3, -5, 2, -3, 3, 9, 0, 123, 1, -10])
123
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def max_element(l: list):
"""Return maximum element in the list.
>>> max_element([1, 2, 3])
3
>>> max_element([5, 3, -5, 2, -3, 3, 9, 0, 123, 1, -10])
123
"""
Model answer
def max_element(l: list):
"""Return maximum element in the list.
>>> max_element([1, 2, 3])
3
>>> max_element([5, 3, -5, 2, -3, 3, 9, 0, 123, 1, -10])
123
"""
if not l:
return None
return max(l)Reasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passIncorrect samples
sample 4 · humaneval-plus:HumanEval/32fail · 0.0% · 29213ms · 250cd50d8dd1
Question
import math
def poly(xs: list, x: float):
"""
Evaluates polynomial with coefficients xs at point x.
return xs[0] + xs[1] * x + xs[1] * x^2 + .... xs[n] * x^n
"""
return sum([coeff * math.pow(x, i) for i, coeff in enumerate(xs)])
def find_zero(xs: list):
""" xs are coefficients of a polynomial.
find_zero find x such that poly(x) = 0.
find_zero returns only only zero point, even if there are many.
Moreover, find_zero only takes list xs having even number of coefficients
and largest non zero coefficient as it guarantees
a solution.
>>> round(find_zero([1, 2]), 2) # f(x) = 1 + 2x
-0.5
>>> round(find_zero([-6, 11, -6, 1]), 2) # (x - 1) * (x - 2) * (x - 3) = -6 + 11x - 6x^2 + x^3
1.0
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
import math
def poly(xs: list, x: float):
"""
Evaluates polynomial with coefficients xs at point x.
return xs[0] + xs[1] * x + xs[1] * x^2 + .... xs[n] * x^n
"""
return sum([coeff * math.pow(x, i) for i, coeff in enumerate(xs)])
def find_zero(xs: list):
""" xs are coefficients of a polynomial.
find_zero find x such that poly(x) = 0.
find_zero returns only only zero point, even if there are many.
Moreover, find_zero only takes list xs having even number of coefficients
and largest non zero coefficient as it guarantees
a solution.
>>> round(find_zero([1, 2]), 2) # f(x) = 1 + 2x
-0.5
>>> round(find_zero([-6, 11, -6, 1]), 2) # (x - 1) * (x - 2) * (x - 3) = -6 + 11x - 6x^2 + x^3
1.0
"""
Model answer
import math
def poly(xs: list, x: float):
"""
Evaluates polynomial with coefficients xs at point x.
return xs[0] + xs[1] * x + xs[1] * x^2 + .... xs[n] * x^n
"""
return sum([coeff * math.pow(x, i) for i, coeff in enumerate(xs)])
def find_zero(xs: list):
""" xs are coefficients of a polynomial.
find_zero find x such that poly(x) = 0.
find_zero returns only only zero point, even if there are many.
Moreover, find_zero only takes list xs having even number of coefficients
and largest non zero coefficient as it guarantees
a solution.
>>> round(find_zero([1, 2]), 2) # f(x) = 1 + 2x
-0.5
>>> round(find_zero([-6, 11, -6, 1]), 2) # (x - 1) * (x - 2) * (x - 3) = -6 + 11x - 6x^2 + x^3
1.0
"""
# Bisection method
# Since the polynomial has an even number of coefficients, its degree is odd.
# An odd-degree polynomial always has at least one real root.
# Find a range [a, b] where the function changes sign.
# Cauchy's bound: all roots lie within |x| < 1 + max(|a_i| / |a_n|)
n = len(xs) - 1
a_n = xs[-1]
max_coeff = max(abs(c) for c in xs)
bound = 1 + max_coeff / abs(a_n)
low = -bound
high = bound
# Ensure we have a sign change
f_low = poly(xs, low)
f_high = poly(xs, high)
# If the bound doesn't provide a sign change (rare for odd degree),
# expand until it does.
while f_low * f_high > 0:
low *= 2
high *= 2
f_low = poly(xs, low)
f_high = poly(xs, high)
# Bisection loop
for _ in range(100):
mid = (low + high) / 2
f_mid = poly(xs, mid)
if f_mid == 0:
return mid
if f_low * f_mid < 0:
high = mid
f_high = f_mid
else:
low = mid
f_low = f_mid
return (low + high) / 2Reasoning trace
sandbox: passed=false returncode=1 timed_out=false
Traceback (most recent call last):
File "/tmp/tmpottjisct/solution.py", line 83, in <module>
check(find_zero)
File "/tmp/tmpottjisct/solution.py", line 80, in check
assert _poly(*candidate(*inp), inp) <= 0.0001
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Value after * must be an iterable, not floatExtracted:
fail · Gold: passsample 23 · humaneval-plus:HumanEval/91fail · 0.0% · 13528ms · 57147f23be53
Question
def is_bored(S):
"""
You'll be given a string of words, and your task is to count the number
of boredoms. A boredom is a sentence that starts with the word "I".
Sentences are delimited by '.', '?' or '!'.
For example:
>>> is_bored("Hello world")
0
>>> is_bored("The sky is blue. The sun is shining. I love this weather")
1
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def is_bored(S):
"""
You'll be given a string of words, and your task is to count the number
of boredoms. A boredom is a sentence that starts with the word "I".
Sentences are delimited by '.', '?' or '!'.
For example:
>>> is_bored("Hello world")
0
>>> is_bored("The sky is blue. The sun is shining. I love this weather")
1
"""
Model answer
def is_bored(S):
"""
You'll be given a string of words, and your task is to count the number
of boredoms. A boredom is a sentence that starts with the word "I".
Sentences are delimited by '.', '?' or '!'.
For example:
>>> is_bored("Hello world")
0
>>> is_bored("The sky is blue. The sun is shining. I love this weather")
1
"""
import re
# Split the string by any of the delimiters '.', '?', or '!'
sentences = re.split(r'[.?!]', S)
count = 0
for sentence in sentences:
# Strip leading whitespace and check if the sentence starts with "I"
# We check if the first word is exactly "I" followed by a space or end of string
stripped = sentence.strip()
if stripped.startswith("I"):
# Ensure it's the word "I" and not just a word starting with I (like "Ice")
if len(stripped) == 1 or stripped[1] == ' ':
count += 1
return countReasoning trace
sandbox: passed=false returncode=1 timed_out=false
Traceback (most recent call last):
File "/tmp/tmpoi_61tqy/solution.py", line 60, in <module>
check(is_bored)
File "/tmp/tmpoi_61tqy/solution.py", line 57, in check
assertion(candidate(*inp), exp, 0)
File "/tmp/tmpoi_61tqy/solution.py", line 50, in assertion
assert exact_match
^^^^^^^^^^^
AssertionErrorExtracted:
fail · Gold: passsample 50 · humaneval-plus:HumanEval/39fail · 0.0% · 31038ms · c5befd80bb93
Question
def prime_fib(n: int):
"""
prime_fib returns n-th number that is a Fibonacci number and it's also prime.
>>> prime_fib(1)
2
>>> prime_fib(2)
3
>>> prime_fib(3)
5
>>> prime_fib(4)
13
>>> prime_fib(5)
89
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def prime_fib(n: int):
"""
prime_fib returns n-th number that is a Fibonacci number and it's also prime.
>>> prime_fib(1)
2
>>> prime_fib(2)
3
>>> prime_fib(3)
5
>>> prime_fib(4)
13
>>> prime_fib(5)
89
"""
Model answer
def prime_fib(n: int):
"""
prime_fib returns n-th number that is a Fibonacci number and it's also prime.
>>> prime_fib(1)
2
>>> prime_fib(2)
3
>>> prime_fib(3)
5
>>> prime_fib(4)
13
>>> prime_fib(5)
89
"""
def is_prime(num):
if num < 2:
return False
for i in range(2, int(num**0.5) + 1):
if num % i == 0:
return False
return True
count = 0
a, b = 0, 1
while True:
if is_prime(a):
count += 1
if count == n:
return a
a, b = b, a + bReasoning trace
sandbox: passed=false returncode=-24 timed_out=false
Extracted:
fail · Gold: passsample 70 · humaneval-plus:HumanEval/145fail · 0.0% · 12512ms · d5a81ec8d1f0
Question
def order_by_points(nums):
"""
Write a function which sorts the given list of integers
in ascending order according to the sum of their digits.
Note: if there are several items with similar sum of their digits,
order them based on their index in original list.
For example:
>>> order_by_points([1, 11, -1, -11, -12]) == [-1, -11, 1, -12, 11]
>>> order_by_points([]) == []
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def order_by_points(nums):
"""
Write a function which sorts the given list of integers
in ascending order according to the sum of their digits.
Note: if there are several items with similar sum of their digits,
order them based on their index in original list.
For example:
>>> order_by_points([1, 11, -1, -11, -12]) == [-1, -11, 1, -12, 11]
>>> order_by_points([]) == []
"""
Model answer
def order_by_points(nums):
"""
Write a function which sorts the given list of integers
in ascending order according to the sum of their digits.
Note: if there are several items with similar sum of their digits,
order them based on their index in original list.
For example:
>>> order_by_points([1, 11, -1, -11, -12]) == [-1, -11, 1, -12, 11]
>>> order_by_points([]) == []
"""
def get_digit_sum(n):
# Use absolute value to handle negative numbers as per the example
# Example: -11 -> 1+1 = 2; -12 -> 1+2 = 3
return sum(int(digit) for digit in str(abs(n)))
# Python's sort is stable, so we only need to sort by the digit sum
# to maintain the original relative order for items with the same sum.
return sorted(nums, key=get_digit_sum)Reasoning trace
sandbox: passed=false returncode=1 timed_out=false
Traceback (most recent call last):
File "/tmp/tmpvxwtfojv/solution.py", line 54, in <module>
check(order_by_points)
File "/tmp/tmpvxwtfojv/solution.py", line 51, in check
assertion(candidate(*inp), exp, 0)
File "/tmp/tmpvxwtfojv/solution.py", line 44, in assertion
assert exact_match
^^^^^^^^^^^
AssertionErrorExtracted:
fail · Gold: passsample 80 · humaneval-plus:HumanEval/116fail · 0.0% · 14081ms · e424308979c4
Question
def sort_array(arr):
"""
In this Kata, you have to sort an array of non-negative integers according to
number of ones in their binary representation in ascending order.
For similar number of ones, sort based on decimal value.
It must be implemented like this:
>>> sort_array([1, 5, 2, 3, 4]) == [1, 2, 3, 4, 5]
>>> sort_array([-2, -3, -4, -5, -6]) == [-6, -5, -4, -3, -2]
>>> sort_array([1, 0, 2, 3, 4]) [0, 1, 2, 3, 4]
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def sort_array(arr):
"""
In this Kata, you have to sort an array of non-negative integers according to
number of ones in their binary representation in ascending order.
For similar number of ones, sort based on decimal value.
It must be implemented like this:
>>> sort_array([1, 5, 2, 3, 4]) == [1, 2, 3, 4, 5]
>>> sort_array([-2, -3, -4, -5, -6]) == [-6, -5, -4, -3, -2]
>>> sort_array([1, 0, 2, 3, 4]) [0, 1, 2, 3, 4]
"""
Model answer
def sort_array(arr):
"""
In this Kata, you have to sort an array of non-negative integers according to
number of ones in their binary representation in ascending order.
For similar number of ones, sort based on decimal value.
It must be implemented like this:
>>> sort_array([1, 5, 2, 3, 4]) == [1, 2, 3, 4, 5]
>>> sort_array([-2, -3, -4, -5, -6]) == [-6, -5, -4, -3, -2]
>>> sort_array([1, 0, 2, 3, 4]) == [0, 1, 2, 4, 3]
"""
# bin(x).count('1') counts the set bits for non-negative integers.
# For negative integers, bin() returns '-0b...', so we handle the absolute
# value or specific bit representation if required, but based on the
# prompt's logic for "non-negative integers" and the provided examples:
return sorted(arr, key=lambda x: (bin(x).count('1') if x >= 0 else bin(x & 0xFFFFFFFF).count('1'), x))Reasoning trace
sandbox: passed=false returncode=1 timed_out=false
Traceback (most recent call last):
File "/tmp/tmp_0sglukh/solution.py", line 51, in <module>
check(sort_array)
File "/tmp/tmp_0sglukh/solution.py", line 48, in check
assertion(candidate(*inp), exp, 0)
File "/tmp/tmp_0sglukh/solution.py", line 41, in assertion
assert exact_match
^^^^^^^^^^^
AssertionErrorExtracted:
fail · Gold: passby Stewart_of_Mars · shard 2 · 6/29/2026, 3:29:57 AM · cmqynt9tb087vpk01qr68o3u693.1%108/116 correct · 5 correct traces · 5 incorrect traces
by Stewart_of_Mars · shard 2 · 6/29/2026, 3:29:57 AM · cmqynt9tb087vpk01qr68o3u6
93.1%
Correct samples
sample 1 · humaneval-plus:HumanEval/58pass · 100.0% · 7116ms · 79737ca98da8
Question
def common(l1: list, l2: list):
"""Return sorted unique common elements for two lists.
>>> common([1, 4, 3, 34, 653, 2, 5], [5, 7, 1, 5, 9, 653, 121])
[1, 5, 653]
>>> common([5, 3, 2, 8], [3, 2])
[2, 3]
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def common(l1: list, l2: list):
"""Return sorted unique common elements for two lists.
>>> common([1, 4, 3, 34, 653, 2, 5], [5, 7, 1, 5, 9, 653, 121])
[1, 5, 653]
>>> common([5, 3, 2, 8], [3, 2])
[2, 3]
"""
Model answer
def common(l1: list, l2: list):
"""Return sorted unique common elements for two lists.
>>> common([1, 4, 3, 34, 653, 2, 5], [5, 7, 1, 5, 9, 653, 121])
[1, 5, 653]
>>> common([5, 3, 2, 8], [3, 2])
[2, 3]
"""
return sorted(list(set(l1) & set(l2)))Reasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passsample 2 · humaneval-plus:HumanEval/109pass · 100.0% · 23884ms · 98dad1fbf586
Question
def move_one_ball(arr):
"""We have an array 'arr' of N integers arr[1], arr[2], ..., arr[N].The
numbers in the array will be randomly ordered. Your task is to determine if
it is possible to get an array sorted in non-decreasing order by performing
the following operation on the given array:
You are allowed to perform right shift operation any number of times.
One right shift operation means shifting all elements of the array by one
position in the right direction. The last element of the array will be moved to
the starting position in the array i.e. 0th index.
If it is possible to obtain the sorted array by performing the above operation
then return True else return False.
If the given array is empty then return True.
Note: The given list is guaranteed to have unique elements.
For Example:
move_one_ball([3, 4, 5, 1, 2])==>True
Explanation: By performin 2 right shift operations, non-decreasing order can
be achieved for the given array.
move_one_ball([3, 5, 4, 1, 2])==>False
Explanation:It is not possible to get non-decreasing order for the given
array by performing any number of right shift operations.
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def move_one_ball(arr):
"""We have an array 'arr' of N integers arr[1], arr[2], ..., arr[N].The
numbers in the array will be randomly ordered. Your task is to determine if
it is possible to get an array sorted in non-decreasing order by performing
the following operation on the given array:
You are allowed to perform right shift operation any number of times.
One right shift operation means shifting all elements of the array by one
position in the right direction. The last element of the array will be moved to
the starting position in the array i.e. 0th index.
If it is possible to obtain the sorted array by performing the above operation
then return True else return False.
If the given array is empty then return True.
Note: The given list is guaranteed to have unique elements.
For Example:
move_one_ball([3, 4, 5, 1, 2])==>True
Explanation: By performin 2 right shift operations, non-decreasing order can
be achieved for the given array.
move_one_ball([3, 5, 4, 1, 2])==>False
Explanation:It is not possible to get non-decreasing order for the given
array by performing any number of right shift operations.
"""
Model answer
def move_one_ball(arr):
"""We have an array 'arr' of N integers arr[1], arr[2], ..., arr[N].The
numbers in the array will be randomly ordered. Your task is to determine if
it is possible to get an array sorted in non-decreasing order by performing
the following operation on the given array:
You are allowed to perform right shift operation any number of times.
One right shift operation means shifting all elements of the array by one
position in the right direction. The last element of the array will be moved to
the starting position in the array i.e. 0th index.
If it is possible to obtain the sorted array by performing the above operation
then return True else return False.
If the given array is empty then return True.
Note: The given list is guaranteed to have unique elements.
For Example:
move_one_ball([3, 4, 5, 1, 2])==>True
Explanation: By performin 2 right shift operations, non-decreasing order can
be achieved for the given array.
move_one_ball([3, 5, 4, 1, 2])==>False
Explanation:It is not possible to get non-decreasing order for the given
array by performing any number of right shift operations.
"""
if not arr:
return True
n = len(arr)
# A right-shifted sorted array can have at most one point where arr[i] > arr[i+1]
# If it has exactly one such point, the last element must be less than or equal to the first element
# to ensure that shifting it back results in a sorted array.
count_drops = 0
for i in range(n - 1):
if arr[i] > arr[i + 1]:
count_drops += 1
if count_drops == 0:
return True
elif count_drops == 1:
# Check if the "wrap-around" is also sorted (last element <= first element)
return arr[n - 1] <= arr[0]
else:
return FalseReasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passsample 3 · humaneval-plus:HumanEval/9pass · 100.0% · 8458ms · d0ae3d210f9e
Question
from typing import List, Tuple
def rolling_max(numbers: List[int]) -> List[int]:
""" From a given list of integers, generate a list of rolling maximum element found until given moment
in the sequence.
>>> rolling_max([1, 2, 3, 2, 3, 4, 2])
[1, 2, 3, 3, 3, 4, 4]
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
from typing import List, Tuple
def rolling_max(numbers: List[int]) -> List[int]:
""" From a given list of integers, generate a list of rolling maximum element found until given moment
in the sequence.
>>> rolling_max([1, 2, 3, 2, 3, 4, 2])
[1, 2, 3, 3, 3, 4, 4]
"""
Model answer
from typing import List, Tuple
def rolling_max(numbers: List[int]) -> List[int]:
""" From a given list of integers, generate a list of rolling maximum element found until given moment
in the sequence.
>>> rolling_max([1, 2, 3, 2, 3, 4, 2])
[1, 2, 3, 3, 3, 4, 4]
"""
if not numbers:
return []
result = []
current_max = numbers[0]
for num in numbers:
if num > current_max:
current_max = num
result.append(current_max)
return resultReasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passsample 4 · humaneval-plus:HumanEval/14pass · 100.0% · 4174ms · 34feeba6cad2
Question
from typing import List
def all_prefixes(string: str) -> List[str]:
""" Return list of all prefixes from shortest to longest of the input string
>>> all_prefixes('abc')
['a', 'ab', 'abc']
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
from typing import List
def all_prefixes(string: str) -> List[str]:
""" Return list of all prefixes from shortest to longest of the input string
>>> all_prefixes('abc')
['a', 'ab', 'abc']
"""
Model answer
from typing import List
def all_prefixes(string: str) -> List[str]:
""" Return list of all prefixes from shortest to longest of the input string
>>> all_prefixes('abc')
['a', 'ab', 'abc']
"""
return [string[:i+1] for i in range(len(string))]Reasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passsample 5 · humaneval-plus:HumanEval/36pass · 100.0% · 7739ms · 3a34ca57ea48
Question
def fizz_buzz(n: int):
"""Return the number of times the digit 7 appears in integers less than n which are divisible by 11 or 13.
>>> fizz_buzz(50)
0
>>> fizz_buzz(78)
2
>>> fizz_buzz(79)
3
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def fizz_buzz(n: int):
"""Return the number of times the digit 7 appears in integers less than n which are divisible by 11 or 13.
>>> fizz_buzz(50)
0
>>> fizz_buzz(78)
2
>>> fizz_buzz(79)
3
"""
Model answer
def fizz_buzz(n: int):
"""Return the number of times the digit 7 appears in integers less than n which are divisible by 11 or 13.
>>> fizz_buzz(50)
0
>>> fizz_buzz(78)
2
>>> fizz_buzz(79)
3
"""
count = 0
for i in range(n):
if i % 11 == 0 or i % 13 == 0:
count += str(i).count('7')
return countReasoning trace
sandbox: passed=true returncode=0 timed_out=false
Extracted:
pass · Gold: passIncorrect samples
sample 11 · humaneval-plus:HumanEval/132fail · 0.0% · 7999ms · d65ed04e2d42
Question
def is_nested(string):
'''
Create a function that takes a string as input which contains only square brackets.
The function should return True if and only if there is a valid subsequence of brackets
where at least one bracket in the subsequence is nested.
is_nested('[[]]') ➞ True
is_nested('[]]]]]]][[[[[]') ➞ False
is_nested('[][]') ➞ False
is_nested('[]') ➞ False
is_nested('[[][]]') ➞ True
is_nested('[[]][[') ➞ True
'''Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def is_nested(string):
'''
Create a function that takes a string as input which contains only square brackets.
The function should return True if and only if there is a valid subsequence of brackets
where at least one bracket in the subsequence is nested.
is_nested('[[]]') ➞ True
is_nested('[]]]]]]][[[[[]') ➞ False
is_nested('[][]') ➞ False
is_nested('[]') ➞ False
is_nested('[[][]]') ➞ True
is_nested('[[]][[') ➞ True
'''
Model answer
def is_nested(string):
'''
Create a function that takes a string as input which contains only square brackets.
The function should return True if and only if there is a valid subsequence of brackets
where at least one bracket in the subsequence is nested.
'''
stack = []
for char in string:
if char == '[':
stack.append(char)
elif char == ']':
if stack:
# If we find a closing bracket and there's more than one
# opening bracket in the stack, it means this pair is nested.
if len(stack) > 1:
return True
stack.pop()
return FalseReasoning trace
sandbox: passed=false returncode=1 timed_out=false
Traceback (most recent call last):
File "/tmp/tmp7o6aliei/solution.py", line 53, in <module>
check(is_nested)
File "/tmp/tmp7o6aliei/solution.py", line 50, in check
assertion(candidate(*inp), exp, 0)
File "/tmp/tmp7o6aliei/solution.py", line 43, in assertion
assert exact_match
^^^^^^^^^^^
AssertionErrorExtracted:
fail · Gold: passsample 29 · humaneval-plus:HumanEval/163fail · 0.0% · 32615ms · 46cd1b674899
Question
def generate_integers(a, b):
"""
Given two positive integers a and b, return the even digits between a
and b, in ascending order.
For example:
generate_integers(2, 8) => [2, 4, 6, 8]
generate_integers(8, 2) => [2, 4, 6, 8]
generate_integers(10, 14) => []
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def generate_integers(a, b):
"""
Given two positive integers a and b, return the even digits between a
and b, in ascending order.
For example:
generate_integers(2, 8) => [2, 4, 6, 8]
generate_integers(8, 2) => [2, 4, 6, 8]
generate_integers(10, 14) => []
"""
Model answer
def generate_integers(a, b):
"""
Given two positive integers a and b, return the even digits between a
and b, in ascending order.
For example:
generate_integers(2, 8) => [2, 4, 6, 8]
generate_integers(8, 2) => [2, 4, 6, 8]
generate_integers(10, 14) => []
"""
start = min(a, b)
end = max(a, b)
# The problem specifies "even digits", which are 0, 2, 4, 6, 8.
# Since a and b are positive integers, we check for even numbers
# in the range [start, end] that are also single digits.
result = []
for i in range(start, end + 1):
if 0 <= i <= 9 and i % 2 == 0:
result.append(i)
return resultReasoning trace
sandbox: passed=false returncode=-24 timed_out=false
Extracted:
fail · Gold: passsample 32 · humaneval-plus:HumanEval/76fail · 0.0% · 10770ms · baea024f4bf6
Question
def is_simple_power(x, n):
"""Your task is to write a function that returns true if a number x is a simple
power of n and false in other cases.
x is a simple power of n if n**int=x
For example:
is_simple_power(1, 4) => true
is_simple_power(2, 2) => true
is_simple_power(8, 2) => true
is_simple_power(3, 2) => false
is_simple_power(3, 1) => false
is_simple_power(5, 3) => false
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def is_simple_power(x, n):
"""Your task is to write a function that returns true if a number x is a simple
power of n and false in other cases.
x is a simple power of n if n**int=x
For example:
is_simple_power(1, 4) => true
is_simple_power(2, 2) => true
is_simple_power(8, 2) => true
is_simple_power(3, 2) => false
is_simple_power(3, 1) => false
is_simple_power(5, 3) => false
"""
Model answer
def is_simple_power(x, n):
"""Your task is to write a function that returns true if a number x is a simple
power of n and false in other cases.
x is a simple power of n if n**int=x
For example:
is_simple_power(1, 4) => true
is_simple_power(2, 2) => true
is_simple_power(8, 2) => true
is_simple_power(3, 2) => false
is_simple_power(3, 1) => false
is_simple_power(5, 3) => false
"""
if x == 1:
return True
if n <= 1:
return x == n
power = n
while power <= x:
if power == x:
return True
power *= n
return FalseReasoning trace
sandbox: passed=false returncode=1 timed_out=false
Traceback (most recent call last):
File "/tmp/tmp86dyuk_f/solution.py", line 58, in <module>
check(is_simple_power)
File "/tmp/tmp86dyuk_f/solution.py", line 55, in check
assertion(candidate(*inp), exp, 0)
File "/tmp/tmp86dyuk_f/solution.py", line 48, in assertion
assert exact_match
^^^^^^^^^^^
AssertionErrorExtracted:
fail · Gold: passsample 54 · humaneval-plus:HumanEval/101fail · 0.0% · 7559ms · 494a4417729c
Question
def words_string(s):
"""
You will be given a string of words separated by commas or spaces. Your task is
to split the string into words and return an array of the words.
For example:
words_string("Hi, my name is John") == ["Hi", "my", "name", "is", "John"]
words_string("One, two, three, four, five, six") == ["One", "two", "three", "four", "five", "six"]
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def words_string(s):
"""
You will be given a string of words separated by commas or spaces. Your task is
to split the string into words and return an array of the words.
For example:
words_string("Hi, my name is John") == ["Hi", "my", "name", "is", "John"]
words_string("One, two, three, four, five, six") == ["One", "two", "three", "four", "five", "six"]
"""
Model answer
import re
def words_string(s):
"""
You will be given a string of words separated by commas or spaces. Your task is
to split the string into words and return an array of the words.
For example:
words_string("Hi, my name is John") == ["Hi", "my", "name", "is", "John"]
words_string("One, two, three, four, five, six") == ["One", "two", "three", "four", "five", "six"]
"""
return re.split(r'[,\s]+', s.strip(', ')) if s.strip(', ') else []Reasoning trace
sandbox: passed=false returncode=1 timed_out=false
Traceback (most recent call last):
File "/tmp/tmp3ase2603/solution.py", line 47, in <module>
check(words_string)
File "/tmp/tmp3ase2603/solution.py", line 44, in check
assertion(candidate(*inp), exp, 0)
File "/tmp/tmp3ase2603/solution.py", line 37, in assertion
assert exact_match
^^^^^^^^^^^
AssertionErrorExtracted:
fail · Gold: passsample 59 · humaneval-plus:HumanEval/91fail · 0.0% · 11637ms · 57147f23be53
Question
def is_bored(S):
"""
You'll be given a string of words, and your task is to count the number
of boredoms. A boredom is a sentence that starts with the word "I".
Sentences are delimited by '.', '?' or '!'.
For example:
>>> is_bored("Hello world")
0
>>> is_bored("The sky is blue. The sun is shining. I love this weather")
1
"""Rendered prompt
Complete the following Python function. Reply with the complete implementation in a single ```python code block and no prose.
def is_bored(S):
"""
You'll be given a string of words, and your task is to count the number
of boredoms. A boredom is a sentence that starts with the word "I".
Sentences are delimited by '.', '?' or '!'.
For example:
>>> is_bored("Hello world")
0
>>> is_bored("The sky is blue. The sun is shining. I love this weather")
1
"""
Model answer
import re
def is_bored(S):
"""
You'll be given a string of words, and your task is to count the number
of boredoms. A boredom is a sentence that starts with the word "I".
Sentences are delimited by '.', '?' or '!'.
For example:
>>> is_bored("Hello world")
0
>>> is_bored("The sky is blue. The sun is shining. I love this weather")
1
"""
# Split the string by any of the delimiters '.', '?', or '!'
sentences = re.split(r'[.?!]', S)
count = 0
for sentence in sentences:
# Remove leading whitespace to check the first word
stripped = sentence.lstrip()
# Check if the sentence starts with the word "I" followed by a space or end of string
if stripped.startswith("I ") or stripped == "I":
count += 1
return countReasoning trace
sandbox: passed=false returncode=1 timed_out=false
Traceback (most recent call last):
File "/tmp/tmpqdcnv79m/solution.py", line 61, in <module>
check(is_bored)
File "/tmp/tmpqdcnv79m/solution.py", line 58, in check
assertion(candidate(*inp), exp, 0)
File "/tmp/tmpqdcnv79m/solution.py", line 51, in assertion
assert exact_match
^^^^^^^^^^^
AssertionErrorExtracted:
fail · Gold: pass