HellaSwag

Official shard eval

HellaSwag deterministic disjoint eval shards from Rowan/hellaswag:validation. Running all 13 shards covers the full 10,042-question dataset; Postgres stores only shard metadata and pass/fail results.

Source

Category: ReasoningEval type: Shard evalQuestions: 10,042Shards: 13Runs: 28

Dry-run first: lmx eval shard hellaswag --base-url http://localhost:8000 --questions 371 --dry-run. Then submit with a real model and hardware profile:

lmx eval shard hellaswag --base-url http://localhost:8000 --questions 371 --model <hfId> --hardware hardware.json --submit

. Scores are pooled by unique question_id; the leaderboard is ranked by Wilson 95% lower bound. HellaSwag uses canonical continuation loglikelihood scoring. Use a logprob-capable /v1/completions endpoint, or pass --model-path model.gguf to use the bundled local llama.cpp scorer.

Leaderboard

Qwopus3.6-27B-Coder-MTP-GGUFQ5_K_M · ggufunknown protocol · unknown agent

Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF · 1 run · 1/13 shards · harness-scoped

92.9%

95% CI 90.8–94.5%

717/772 correct · 7.7% coverage

Gemma-4-31B-IT-NVFP4unspecified quantunknown protocol · unknown agent

nvidia/Gemma-4-31B-IT-NVFP4 · 12 runs · 12/13 shards · harness-scoped

80.9%

95% CI 79.7–82.0%

3600/4452 correct · 44.3% coverage

Qwen3.6-27B-NVFP4NVFP4unknown protocol · unknown agent

nvidia/Qwen3.6-27B-NVFP4 · 13 runs · 13/13 shards · harness-scoped

79.8%

95% CI 78.6–80.9%

3848/4823 correct · 48.0% coverage

gemma-4-12b-it-GGUFQ8_K_XL · gguf

unsloth/gemma-4-12b-it-GGUF · 1 run · 1/13 shards · harness-scoped

54.0%

95% CI 44.3–63.4%

54/100 correct · 1.0% coverage

gemma-4-26B-A4B-it-qat-GGUFQ4_K_XL · gguf

unsloth/gemma-4-26B-A4B-it-qat-GGUF · 1 run · 1/13 shards · harness-scoped

52.0%

95% CI 42.3–61.5%

52/100 correct · 1.0% coverage

Stability— historical rerun transparency

Leaderboard rank uses the canonical latest approved answer per question_id. These metrics include historical submissions too, so reruns and changed answers are visible but do not drive rank.

Qwopus3.6-27B-Coder-MTP-GGUFQ5_K_M · ggufunknown protocol · unknown agent

1 historical run · 772 unique questions

Canonical

92.9%

Row avg

92.9%

Run avg

92.9%

Repeated

Changed

Gemma-4-31B-IT-NVFP4unspecified quantunknown protocol · unknown agent

12 historical runs · 4,452 unique questions

Canonical

80.9%

Row avg

80.9%

Run avg

80.9%

Repeated

Changed

Qwen3.6-27B-NVFP4NVFP4unknown protocol · unknown agent

13 historical runs · 4,823 unique questions

Canonical

79.8%

Row avg

79.8%

Run avg

79.8%

Repeated

Changed

gemma-4-12b-it-GGUFQ8_K_XL · gguf

0 historical runs · 100 unique questions

Canonical

54.0%

Row avg

—

Run avg

—

Repeated

Changed

gemma-4-26B-A4B-it-qat-GGUFQ4_K_XL · gguf

0 historical runs · 100 unique questions

Canonical

52.0%

Row avg

—

Run avg

—

Repeated

Changed

Runs— sample traces per run

Qwen3.6-27B-NVFP4NVFP4unknown protocol · unknown agent

by Stewart_of_Mars · shard 13 · 7/1/2026, 1:18:15 PM · cmr23piuc04nkpi01c9juykg5

77.9%

289/371 correct · 5 correct traces · 5 incorrect traces

Download all shard traces (JSONL)

Correct samples

sample 2 · hellaswag:00006547pass · 100.0% · 4401ms · e498ae6053dd

Question

[header] How to buy organic soil [title] Do your research. [step] Truly organic soil can be hard to come by since the market for organic potting media has only begun increasing in the past few years. Moreover, organic standards have also become stricter as the demand for organic goods has grown, so a soil that was once deemed organic may no longer fulfill those requirements legally.

A. To survive, the best option is to research the quality of organic soil you buy and see how they relate with your home or ecosystem. [substeps] Make sure that your soil is organic.
B. Determining what is organic will help you decide whether it would be a good idea to buy the soil you need. [substeps] There are two main kinds of organic soil: organic and non-organic.
C. Before shopping for an organic soil, inform yourself so that you know what to expect. Traces of chemicals can be common, especially in soils containing certain types of manure, but if the amount is low enough it could still qualify as organic.
D. [title] Buy organic potting soil instead of commercial ones. [step] Changing to commercial potting soil will reduce the amount of organic overgrowth in your field and reduce the amount of organic manure produced by your cultivars' buckers.

Rendered prompt

[header] How to buy organic soil [title] Do your research. [step] Truly organic soil can be hard to come by since the market for organic potting media has only begun increasing in the past few years. Moreover, organic standards have also become stricter as the demand for organic goods has grown, so a soil that was once deemed organic may no longer fulfill those requirements legally.

Model answer

Extracted: C · Gold: C

sample 3 · hellaswag:00008485pass · 100.0% · 4400ms · ef5ba648c703

Question

[header] How to burp a sleeping baby [title] Hold your baby and burp them. [step] This technique is good for babies who sleep on their stomach or who like to cuddle when they sleep. [substeps] Slowly move your baby next to your body so you do not wake them up.

A. Allow your baby's head or chin to rest on your shoulder and cup their bottom to support them so they don't slip as you hold them. Place your other hand on their back and gently pat it to help them burp.
B. You can also lower your baby's head and neck down into a deep belly position, which will help you burp your baby more easily. Gently nudge your baby at first and again until they are used to this position and begin to adjust to the feel of their neck and pillow.
C. Hold your baby close with your hand and record the amount of air they intake and how much you burp each time before you burp them. This way, you can do it for several days to see how comfortable they will be with your hand.
D. Then gently roll them over so they are lying on their back. Hold and burp your baby together until they become comfortable.

Rendered prompt

[header] How to burp a sleeping baby [title] Hold your baby and burp them. [step] This technique is good for babies who sleep on their stomach or who like to cuddle when they sleep. [substeps] Slowly move your baby next to your body so you do not wake them up.

Model answer

Extracted: A · Gold: A

sample 5 · hellaswag:00010031pass · 100.0% · 4399ms · 34999dc38706

Question

[header] How to rent a limousine [title] Decide when you'll rent the limo. [step] Early summer is the peak of the season for hiring limousines. You will probably be able to secure a better price if you hire a limo outside the months of may and june.

A. You're also likely to get a better rate on weekdays. [substeps] Sunday through thursday are typically the least expensive days, while fridays are about 20% less than saturdays.
B. If you're looking for one in july or august, you might want to be the driver. [title] Decide whether to ask friends, relatives, or strangers.
C. [substeps] Many limousine offers large limousine packages for lower prices. However, the driver hiring a limousine for the funeral may not be the first limo you hire.
D. A limousine is usually on the streets or across city traffic and less hectic between the ages of september and july. Saturday taxis are typically used if you live in an area where it is hot.

Rendered prompt

[header] How to rent a limousine [title] Decide when you'll rent the limo. [step] Early summer is the peak of the season for hiring limousines. You will probably be able to secure a better price if you hire a limo outside the months of may and june.

Model answer

Extracted: A · Gold: A

sample 6 · hellaswag:00005104pass · 100.0% · 4404ms · ea6b5cd4a03e

Question

[header] How to dip dye dark hair [title] Protect yourself and work area from bleach stains. [step] Cover your counter with newspaper. Wrap a dyeing cape or an old towel around your shoulders and put on a pair of plastic gloves.

A. Lastly, make sure that you have all of your supplies laid out and ready to go. [title] Mix the bleach according to the instructions on the package.
B. [substeps] Deep dye dark hair requires heavy amounts of bleach. For deep dye, i recommend having a shirt with dark sleeves.
C. Wear a sturdy t-shirt and apron to protect yourself while you work. Put bleach bowls or a bucket into your sink so that you can come to an additional solution that's the same color as the dye.
D. Bleach is caustic and it affects your hair and eyes. [substeps] You can also wear a bandanna around your neck.

Rendered prompt

[header] How to dip dye dark hair [title] Protect yourself and work area from bleach stains. [step] Cover your counter with newspaper. Wrap a dyeing cape or an old towel around your shoulders and put on a pair of plastic gloves.

Model answer

Extracted: A · Gold: A

sample 7 · hellaswag:00005596pass · 100.0% · 4627ms · fb6cc22c3d0a

Question

[header] How to hire syrian refugees [title] Find your local resettlement office. [step] Refugees admitted into a new country are eligible to work immediately. Government programs assist refugees in finding employment as part of their resettlement programs.

A. If you identify yourself as a refugee, you must work as a refugee in the country. [title] Complete any required occupations (such as a position, career, or order) before applying for a job.
B. [substeps] Use the job search tools at the resource website: https: // www.resettlement-cepus.gov/workspace/guild-acanees. Cfm look on your city or town's website for a list of resettlement department locations.
C. [substeps] In the united states, you can find your local resettlement office by visiting https: // www.acf.hhs.gov/orr/state-programs-annual-overview. Click on your state on the map, or select your state from the drop-down menu below the map to get a list of local offices and contact information.
D. The national guard service reacts positively to refugees seeking work as part of their resettlement program, as they provide a much more productive working environment. [substeps] After visiting your local resettlement office, locate the website by typing " resettlement program " and your city and state into a search engine.

Rendered prompt

[header] How to hire syrian refugees [title] Find your local resettlement office. [step] Refugees admitted into a new country are eligible to work immediately. Government programs assist refugees in finding employment as part of their resettlement programs.

Model answer

Extracted: C · Gold: C

Incorrect samples

sample 1 · hellaswag:00001623fail · 0.0% · 4403ms · a061d3a6bb7b

Question

A man has climbed a large ladder outside. he

A. is using trimmers to cut and trim large trees.
B. ropes and lures a cow into the open.
C. is using it to pull himself up onto a platform where a car is parked.
D. is life saving at the bottom of the ladder.

Rendered prompt

A man has climbed a large ladder outside. he

Model answer

Extracted: C · Gold: A

sample 4 · hellaswag:00005294fail · 0.0% · 4403ms · 7f5b2215e320

Question

[header] How to use a highlighter stick [title] Choose an ivory or cream highlighter shade if you have fair skin. [step] If you have a pale complexion, cream and ivory highlighters with a pearlescent or icy-silver tinge work really well. While these shades tend to look ghostly and unnatural on other complexions, they add a natural-looking glow to fair skin.

A. [substeps] Look for highlighter shades that include words like " moonbeam, " " ice, " and " crystalline. " avoid shades that are darker than cream and ivory.
B. [title] Try a metallic highlighter shade to go with your outfit. [step] If you have a pale complexion, you may prefer something metallic.
C. [substeps] Ivory highlighters are quite pricey. They cost anywhere from $14-20.
D. They make blending difficult to maintain, so get creative and test out different color highlighters. [substeps] For darker skin colors, choose a similarly color highlighter.

Rendered prompt

[header] How to use a highlighter stick [title] Choose an ivory or cream highlighter shade if you have fair skin. [step] If you have a pale complexion, cream and ivory highlighters with a pearlescent or icy-silver tinge work really well. While these shades tend to look ghostly and unnatural on other complexions, they add a natural-looking glow to fair skin.

Model answer

Extracted: B · Gold: A

sample 8 · hellaswag:00001187fail · 0.0% · 4406ms · 7d9da675c4e0

Question

The woman stands still for a short while in her blue/white outfit. Then she lifts the heavy yellow weight and someone who's watching her shows peremptory by yelling. she

A. picks it back up and lifts it again, guzzling the last of the shiraz.
B. drops the weight and the face of female trainer starts into the screen.
C. holds it up for a good second and then drops it back down to the ground.
D. slowly lifts the weight up she long the overweight weight over her head, then drops the weight to the ground.

Rendered prompt

The woman stands still for a short while in her blue/white outfit. Then she lifts the heavy yellow weight and someone who's watching her shows peremptory by yelling. she

Model answer

Extracted: D · Gold: C

sample 13 · hellaswag:00000389fail · 0.0% · 4406ms · 2802a406f49a

Question

A young man is seen playing drums in front of an audience. other people

A. are seen speaking to the camera as well as running into the audience.
B. are watching as he continues to play.
C. are walking by back and fourth.
D. are shown playing guitar while the man plays more drums.

Rendered prompt

A young man is seen playing drums in front of an audience. other people

Model answer

Extracted: B · Gold: D

sample 15 · hellaswag:00008268fail · 0.0% · 4398ms · d96daca04434

Question

[header] How to discreetly breastfeed your baby on the go [title] Wear nursing clothing. [step] Nursing tops can be purchased at maternity stores and most major clothing retailers. These tops have hidden openings that allow you to easily feed your baby.

A. The lining on these dresses is designed to allow you to access the breast once your baby has grown accustomed to you. To remove the lining, simply slide it off and then gently remove it.
B. [substeps] Wear one that covers the breasts, but does not cover them completely. Many maternity tops are available in mesh or sheer fabrics.
C. You can get a variety of items such as tops, bras, and dresses. [substeps] Nursing bras that are made like sports bras are usually better.
D. When your baby is traveling, these have tiny baby items that you can remove and show your baby. [substeps] You can also move to a place where you will not be bothered by baby items.

Rendered prompt

[header] How to discreetly breastfeed your baby on the go [title] Wear nursing clothing. [step] Nursing tops can be purchased at maternity stores and most major clothing retailers. These tops have hidden openings that allow you to easily feed your baby.

Model answer

Extracted: A · Gold: C

Qwen3.6-27B-NVFP4NVFP4unknown protocol · unknown agent

by Stewart_of_Mars · shard 12 · 7/1/2026, 1:17:01 PM · cmr23ny1404cvpi01mpgkrqsz

83.8%

311/371 correct · 5 correct traces · 5 incorrect traces

Download all shard traces (JSONL)

Correct samples

sample 1 · hellaswag:00007797pass · 100.0% · 3873ms · ca10ca786b1b

Question

[header] How to save money renting a car in the united states [title] Check online travel sites. [step] You can compare car rental prices online through most travel aggregators, such as orbitz, priceline, expedia, travelocity, and kayak. Car rental prices can fluctuate wildly, so check multiple sites regularly, starting at least several weeks before you need the car.

A. [substeps] If you combine your car rental with a flight and/or hotel booking, you can often save significantly more than by reserving them separately. [title] Compare prices directly from the major company websites.
B. [substeps] Paying online usually involves submitting a bill to the online car rental service. Check car websites and discuss customizable payment plans before finalizing the transaction.
C. [title] Sign up with a car sharing company. [step] Certain companies such as airtrain, lyft, lambordia, or banal are also carriers for car rental companies.
D. [title] Add up your driving and parking costs. [step] Be prepared to get a loan or borrow money from a friend or family member.

Rendered prompt

[header] How to save money renting a car in the united states [title] Check online travel sites. [step] You can compare car rental prices online through most travel aggregators, such as orbitz, priceline, expedia, travelocity, and kayak. Car rental prices can fluctuate wildly, so check multiple sites regularly, starting at least several weeks before you need the car.

Model answer

Extracted: A · Gold: A

sample 2 · hellaswag:00002368pass · 100.0% · 3884ms · e64052a5f5d3

Question

A woman is shown speaking to the camera while holding a dart. pictures of a man

A. 's arms are shown as well as this dart.
B. with a bowflex in one ear and a dart in her other ear next to a dart board in a room.
C. playing darts appear and he is shaking in distress before returning to talking to the camera.
D. are shown and leads into her throwing darts.

Rendered prompt

A woman is shown speaking to the camera while holding a dart. pictures of a man

Model answer

Extracted: D · Gold: D

sample 3 · hellaswag:00003525pass · 100.0% · 3888ms · 2637f8600bc9

Question

[header] How to wear a bomber jacket [title] Consider fit. [step] Because they started as a utilitarian piece for servicemen strapped with equipment, they're naturally oversized. However, designers haven't been afraid to play around with the bomber jacket design.

A. You've got tons of options-from thick american blazers to ¾-1-come-out bomber jackets. You just need to try them on and see if one fits you perfectly.
B. In fact, they've made many custom bomber jackets fit for military situations in the past. [substeps] Bomber jackets are wide-brimmed, functional, and tend to be less oversized.
C. You can wear them as an oversized piece, but you can also find them in slim fitting versions, cropped versions, and so on. [substeps] Getting the proper fit is key to rocking this look.
D. Try a simple button-down and it can work great. You can use a plain button-down instead if you want to wear some seasonal " steampunk " materials.

Rendered prompt

[header] How to wear a bomber jacket [title] Consider fit. [step] Because they started as a utilitarian piece for servicemen strapped with equipment, they're naturally oversized. However, designers haven't been afraid to play around with the bomber jacket design.

Model answer

Extracted: C · Gold: C

sample 5 · hellaswag:00003679pass · 100.0% · 3880ms · 1d6c41ac9865

Question

[header] How to clean hairbrushes [title] Remove the hair from the brush. [step] Taking out the old hair is always a good idea, since dirt and skin particles tend to stick to it as it builds up in your brush. Reach in and pinch the hair with your fingers, then pull it out from around the bristles.

A. [substeps] Brushes that are attached to anything (such as the brush) will be clean and free of dirt and bacteria, but brushes with metal can damage them and produce residue. [title] Choose one that is easy to clean.
B. [substeps] These small pinches are usually easy to remove or you can simply take them out. Removing used hair from an old hairbrush, just may take some effort to get it off of the handle.
C. Keep doing this until you've extracted all of the hair from the brush, then throw the hair away. [substeps] If your brush's bristles are made from natural fibers, take care not to yank them out with the hair.
D. [substeps] Most brushes have a teeth crown screw or other attachment. If not, put the brush in to remove the crown.

Rendered prompt

[header] How to clean hairbrushes [title] Remove the hair from the brush. [step] Taking out the old hair is always a good idea, since dirt and skin particles tend to stick to it as it builds up in your brush. Reach in and pinch the hair with your fingers, then pull it out from around the bristles.

Model answer

Extracted: C · Gold: C

sample 7 · hellaswag:00005557pass · 100.0% · 3870ms · fd9e38b8c08f

Question

[header] How to tie a doo rag [title] Place the durag on your head. [step] You can choose a color and style of durag that work for you. Many people prefer doo rag that are a stretchy material that can be seen through when stretched.

A. They look like long cargo pants or long jacket pieces. Your crotches will have to be about even or slightly larger than the durag's elastic.
B. This makes them more breathable when they're tied tightly around your head. [substeps] Line the center seam up with the center of your head for symmetry.
C. The durag should be loose enough so that you can loosely wrap the rag around your head. [substeps] Your durag may be hard to place with one hand or with two hands so you need to make sure that it will allow you to handle it.
D. Use a product like dawww.dahoo.com or zieforednwearer if you want a casual cut. Durag fabric is also often called sewn cloth and is woven with hemp or microfiber.

Rendered prompt

[header] How to tie a doo rag [title] Place the durag on your head. [step] You can choose a color and style of durag that work for you. Many people prefer doo rag that are a stretchy material that can be seen through when stretched.

Model answer

Extracted: B · Gold: B

Incorrect samples

sample 4 · hellaswag:00001285fail · 0.0% · 6764ms · 30d7069fbc34

Question

The players stand around talking to one another and one shoots a bow and arrow at a target. more people

A. watch while the owners wait.
B. are seen shooting the bows followed by interviewers speaking and team mates reacting.
C. walk by and the players talk to each other in the background.
D. come and go behind the referee who has just come on stage to take picture and record the game.

Rendered prompt

The players stand around talking to one another and one shoots a bow and arrow at a target. more people

Model answer

Extracted: D · Gold: B

sample 6 · hellaswag:00002729fail · 0.0% · 3886ms · 25a7613dfe9e

Question

A clip begins and it features people running and the words on the screen say they are novices who are training to run a marathon in tel aviv, israel. a man

A. is seen speaking in a temple.
B. is then shown running and the words say can be seen on the screen again saying he has some good news.
C. appears training them and different marathoners in training take turns being interviewed while they continue to switch back and forth to everyone going through the training.
D. is seen standing in front of a street and he begins to perform an act while holding the statue and flipping back and fourth.

Rendered prompt

A clip begins and it features people running and the words on the screen say they are novices who are training to run a marathon in tel aviv, israel. a man

Model answer

Extracted: B · Gold: C

sample 8 · hellaswag:00002795fail · 0.0% · 4311ms · dd25d87c8fb2

Question

A towel and glass are on a table next to a piece of foam and metal. A tool is being shown by a man holding it. the metal

A. is cooled into a liquid and garnishes the fencing material.
B. is separated from the foam and put over a bath tub before being pressed with sponge and rinsed off.
C. is used to sharpen a knife in front of him.
D. is being used on a piece of cloth, cloth, toothbrush and shirt.

Rendered prompt

A towel and glass are on a table next to a piece of foam and metal. A tool is being shown by a man holding it. the metal

Model answer

Extracted: A · Gold: C

sample 11 · hellaswag:00001467fail · 0.0% · 3883ms · 5b7790382de9

Question

A girl runs down a track in slow motion. she

A. is throwing a javelin over the bar.
B. smashes into a sand pit several times.
C. takes a huge jump, hurling herself over a bar.
D. jumps into a sand box and begins to play a game of hopscotch.

Rendered prompt

A girl runs down a track in slow motion. she

Model answer

Extracted: A · Gold: C

sample 12 · hellaswag:00006499fail · 0.0% · 3870ms · a30d372168a2

Question

[header] How to make your hair stay straight all day [title] Use shampoos and conditioners that help straighten hair. [step] Look for shampoos and conditioners that have lots of vitamins and minerals in them, giving your hair refreshing nutrients that will help smooth it out. These are usually marketed as smoothing, sleek, or straightening products.

A. [substeps] Look for a shampoo that lists " hydrating ingredients ", which are sodium and other all natural ingredients. You can also try going for one that specifically states " hydrating " on the bottle.
B. [substeps] If you have very curly hair, you may need to consider using a straightening shampoo. You can also purchase a hair-straightening powder and conditioners to help moisturize your hair.
C. If you have dry hair, look for shampoos that will keep it soft, wet, and shine. If you have hair that is frizzy, opt for sleek, volumizing, or foam shampoos.
D. [substeps] Look for ingredients such as wheat proteins, pro vitamin b5, or hedera helix extract. Take the time to comb the conditioner into your hair while in the shower, using your fingers or a wide-toothed comb.

Rendered prompt

[header] How to make your hair stay straight all day [title] Use shampoos and conditioners that help straighten hair. [step] Look for shampoos and conditioners that have lots of vitamins and minerals in them, giving your hair refreshing nutrients that will help smooth it out. These are usually marketed as smoothing, sleek, or straightening products.

Model answer

Extracted: B · Gold: D

Qwen3.6-27B-NVFP4NVFP4unknown protocol · unknown agent

by Stewart_of_Mars · shard 11 · 7/1/2026, 1:11:56 PM · cmr23hed5041ypi015a1xkbmh

78.2%

290/371 correct · 5 correct traces · 5 incorrect traces

Download all shard traces (JSONL)

Correct samples

sample 3 · hellaswag:00002205pass · 100.0% · 3643ms · 9fde6021f549

Question

We see a man and the intro scene. A man in a kitchen has knifes and a sharpening block. the man

A. grinds the knifes through a rod.
B. begins sharpening the arm of a woman and releases.
C. wets the block and runs a knife across two different blocks.
D. then gets to chopping vegetables mainly in front of a counter with other scenes working.

Rendered prompt

We see a man and the intro scene. A man in a kitchen has knifes and a sharpening block. the man

Model answer

Extracted: C · Gold: C

sample 4 · hellaswag:00006200pass · 100.0% · 6761ms · 24dc8d273221

Question

[header] How to wear yoga pants to work [title] Invest in high-quality yoga pants. [step] If you're wearing yoga pants to work, you want them to be durable while also providing comfort. Try to find yoga pants made of a thicker material, and try them on before purchasing them to ensure they aren't see-through.

A. High-quality yoga pants will likely be designed for yoga or other types of work. [title] Stick to a thin, stretchy, short-waist.
B. Sheer yoga pants are definitely something to avoid in the workplace. [substeps] Yoga leggings offer the most outfit options and tend to look the best in a professional work environment.
C. [substeps] Make sure the pants are specifically made for yoga, preferably ones with natural fibers, such as cotton or silk. When in doubt, look for yoga pants in solid colors.
D. Yoga pants come in a variety of shapes and sizes, and none of them fit under normal circumstances. See the rough guide section for more specific instructions.

Rendered prompt

[header] How to wear yoga pants to work [title] Invest in high-quality yoga pants. [step] If you're wearing yoga pants to work, you want them to be durable while also providing comfort. Try to find yoga pants made of a thicker material, and try them on before purchasing them to ensure they aren't see-through.

Model answer

Extracted: B · Gold: B

sample 5 · hellaswag:00002478pass · 100.0% · 3645ms · 303275c321c6

Question

The ingredients are added to a tub and scoops of ice cream are taken from a tub after freezing. a bowl

A. is filled with ice cream.
B. of ice cream is added to the bowl and melted in four different pans.
C. of olive oil is added as a spoon and bowls of butter are added.
D. is used to rack ice cream.

Rendered prompt

The ingredients are added to a tub and scoops of ice cream are taken from a tub after freezing. a bowl

Model answer

Extracted: A · Gold: A

sample 6 · hellaswag:00003152pass · 100.0% · 6761ms · 4b76423bef3b

Question

A young man is outside attempting to do a highjump behind the london logo. A male man then stands in front of the mat and talks about the jump. another male

A. is then shown as he takes off and clears the bar of the high jump and there's an instant replay.
B. appears in before he is cased six times attempting to same doing the same jump in different locations and taking short breaks.
C. joins in and the two jump from different heights, falling hundreds of times when landing.
D. joins in and prepares to get his jump in acting steadier and hitting the mat.

Rendered prompt

A young man is outside attempting to do a highjump behind the london logo. A male man then stands in front of the mat and talks about the jump. another male

Model answer

Extracted: A · Gold: A

sample 8 · hellaswag:00006290pass · 100.0% · 3640ms · 8f9177b33146

Question

[header] How to play sharks and minnows [title] Pick a shark. [step] Choose a player to be the shark. This player is now " it " and stands in the center of the pool (the " ocean ").

A. They raise their non-dominant arm straight into the water, like diving, and perform the shark trick. The shark's eyes use this trick to reflect waves and also give you a full view of the shark, since you can't see him from the board.
B. [substeps] When playing with a lot of people in a large area, you can play with more than one shark. 10 minnows to 1 shark is a good ratio.
C. He/she is mostly responsible for bringing the dead shark back to the water at the end of each round. [substeps] The top priority is the decision whether or not to leave the pool.
D. You cannot choose sharks from a different species (e.g. they all have different colors, fangs, rings, claws, like fish) or any other species.

Rendered prompt

[header] How to play sharks and minnows [title] Pick a shark. [step] Choose a player to be the shark. This player is now " it " and stands in the center of the pool (the " ocean ").

Model answer

Extracted: B · Gold: B

Incorrect samples

sample 1 · hellaswag:00009544fail · 0.0% · 3639ms · ed7d3a94464d

Question

[header] How to take action against capital punishment in your state [title] Focus your action on a specific case or cases. [step] It may help you to focus your efforts on a couple of current capital cases in your state. Focusing your work on a couple of specific cases may be more effective than tackling several cases or state laws as a whole.

A. However, you should also stay on top of general issues related to capital punishment. Combining this knowledge can help you become a very informed participant in the action against these cases, which in turn may help alter capital punishment laws.
B. Some of the cases in your state are relatively minor compared to other cases in your state. Still, you should aim to focus your efforts on a very specific case at all costs.
C. [substeps] You can look up places to locate a case. There is a wide variety of legal resources online that can help you locate the most common cases you may want to worry about.
D. [substeps] If you believe capital punishment in any of your cases is unjust, find a law that applies to those cases. Look at the organizations's forum links.

Rendered prompt

[header] How to take action against capital punishment in your state [title] Focus your action on a specific case or cases. [step] It may help you to focus your efforts on a couple of current capital cases in your state. Focusing your work on a couple of specific cases may be more effective than tackling several cases or state laws as a whole.

Model answer

Extracted: B · Gold: A

sample 2 · hellaswag:00000706fail · 0.0% · 3644ms · 92983ab87d7a

Question

The boy wipes down the sole of the tennis shoe using the wet wipe towel. The boy dries off the shoe using a towel. the boy

A. rinses the shoe inside the folded towel with water.
B. picks up a high top pair and wipes it off with the same wet towel.
C. tapes down the rear portion of the shoe.
D. shakes his foot with music and the camera follows the movements.

Rendered prompt

The boy wipes down the sole of the tennis shoe using the wet wipe towel. The boy dries off the shoe using a towel. the boy

Model answer

Extracted: C · Gold: B

sample 7 · hellaswag:00009865fail · 0.0% · 4166ms · 80bdeb7973ed

Question

[header] How to make natural shampoo for your hair type [title] Boil 1 cup of spring water. [title] Remove the pan from the heat, and add 2 tablespoons of dried chamomile. [title] Let steep for 15 minutes.

A. [title] Obtain suitable shampoo for your hair type. [step] These shampoos and conditioners are very absorbent and therefore should be dried separately for longer.
B. [title] Strain the herbs from the liquid, and throw away the herbs. [title] Let the liquid sit for about 20 minutes, allowing it to completely cool down.
C. [title] Let cool a little but leave the liquid in for 1-2 minutes. [title] Strain the liquid and store in a cool container to use again later.
D. [title] Test and see if your shampoo mixture will lather up. [step] Add more-combine half of the mixture with 3 tablespoons of water until it bubbles.

Rendered prompt

[header] How to make natural shampoo for your hair type [title] Boil 1 cup of spring water. [title] Remove the pan from the heat, and add 2 tablespoons of dried chamomile. [title] Let steep for 15 minutes.

Model answer

Extracted: A · Gold: B

sample 9 · hellaswag:00002393fail · 0.0% · 3644ms · e6a396ef8b4f

Question

The lady taps the puck with her stick before pushing it hard and pushing the puck across the court. the camera

A. gets itself angled to keep the puck from hitting the boy playing while playground equipment sits nearby.
B. zooms in on the puck and two women are standing next to it talking to each other.
C. zooms in on cupcake detail while the lady takes a breath.
D. follows the pucks which stops short and teeters before resting.

Rendered prompt

The lady taps the puck with her stick before pushing it hard and pushing the puck across the court. the camera

Model answer

Extracted: B · Gold: D

sample 13 · hellaswag:00000193fail · 0.0% · 3644ms · f742e417959c

Question

We see the title introducing us to jesse. jesse the dog

A. owner shows us how he walks and how he breathes.
B. runs and catches frisbee.
C. runs in circles where the dog is his mate.
D. is shown from a shampoo advertisement one last time.

Rendered prompt

We see the title introducing us to jesse. jesse the dog

Model answer

Extracted: A · Gold: B

Qwen3.6-27B-NVFP4NVFP4unknown protocol · unknown agent

by Stewart_of_Mars · shard 10 · 7/1/2026, 1:10:33 PM · cmr23fmdt03r9pi01hoe610ys

81.1%

301/371 correct · 5 correct traces · 5 incorrect traces

Download all shard traces (JSONL)

Correct samples

sample 1 · hellaswag:00001464pass · 100.0% · 3797ms · ba0eec183742

Question

Several athletes compete for the javelin world championships. The russia dimitri is 3rd bronze medal in the competition. The japanese jenki is silver medal. till

A. javelin 1990 the russian javelin champion anna rubens is strong.
B. from germany won the gold medal.
C. loesos facility there is no competition.
D. now the french have won each javelin every round.

Rendered prompt

Several athletes compete for the javelin world championships. The russia dimitri is 3rd bronze medal in the competition. The japanese jenki is silver medal. till

Model answer

Extracted: B · Gold: B

sample 2 · hellaswag:00005973pass · 100.0% · 3418ms · c98299ff5733

Question

[header] How to get japanese characters (kanji, hiragana, katakana) on firefox [title] In the firefox toolbar, go to view] character encoding] auto-detect. [step] While most browsers are already capable of reading international characters, you may simply need to help firefox detect them. [title] Select japanese.

A. [step] By default, most of the newer characters on a firefox list are japanese (or katakana). However, you may have to reach up and back to find ones you'd like to include or that make similarities between each other.
B. [title] Focus your attention on chikatakana. [title] Choose the gender and intent of the character, ren.
C. [step] Some of the same characters can be found on the scroll bar. Selected japanese is the most commonly used japanese toolbar.
D. [step] Your firefox should now be able to detect japanese characters. If not, you may need to install a language pack using one of the methods below.

Rendered prompt

[header] How to get japanese characters (kanji, hiragana, katakana) on firefox [title] In the firefox toolbar, go to view] character encoding] auto-detect. [step] While most browsers are already capable of reading international characters, you may simply need to help firefox detect them. [title] Select japanese.

Model answer

Extracted: D · Gold: D

sample 3 · hellaswag:00006664pass · 100.0% · 3418ms · e5e3b6eec41d

Question

[header] How to make cappuccino foam [title] Pour brewed espresso into your cappuccino cup. [step] Although specifics will depend on the type of cappuccino and the person making it, cappuccinos are roughly one quarter espresso and three quarters steamed milk. Pour your desired amount of brewed espresso into a large cappuccino cup before you steam the milk.

A. [substeps] Instead of cappuccino add straight milk. Espresso can be iced later, and most cappuccino companies encourage it to be iced after brewing to sweeten the milk.
B. That way, you'll have everything ready so you can pour out the steamed milk while it's still piping hot. [title] Fill a steam pitcher with cold milk.
C. This will preserve your milk-stirring capacity throughout the rest of the process. [substeps] Try the recommended amount of coffee to get the right amount.
D. Start off with hot to ensure the coffee in your cup is sufficiently cold to immerse the foam. [substeps] It's best to use skim milk.

Rendered prompt

[header] How to make cappuccino foam [title] Pour brewed espresso into your cappuccino cup. [step] Although specifics will depend on the type of cappuccino and the person making it, cappuccinos are roughly one quarter espresso and three quarters steamed milk. Pour your desired amount of brewed espresso into a large cappuccino cup before you steam the milk.

Model answer

Extracted: B · Gold: B

sample 5 · hellaswag:00001931pass · 100.0% · 3418ms · fb41bcc882aa

Question

A boy holds up a bottle of mouth wash. he

A. bends down and throws it.
B. takes the mouthwash and blows his horn.
C. uses the mouthwash to wash himself and two other teens.
D. drinks a cap full of the mouth wash.

Rendered prompt

A boy holds up a bottle of mouth wash. he

Model answer

Extracted: D · Gold: D

sample 6 · hellaswag:00007682pass · 100.0% · 3419ms · f871412b315b

Question

[header] How to wash newborn hair [title] Rinse your newborn's hair for the first bath. [step] Because the hair and skin of a newborn are so delicate and sensitive, you don't actually need to use shampoo or cleansers. For your newborn's first bath, just pour a little plain water over their scalp.

A. Just enough so that the hair and skin are fully saturated. [substeps] This is also an optional step that you can do while breastfeeding.
B. You will make sure that they have nothing like a toothbrush or comb or baby oil in their shampoos. [title] Repeat the soap and water process with your baby's hair.
C. Pat the newborn's scalp dry with a soft towel. [substeps] You don't need to scrub, massage, or really wash the scalp until they're a little older.
D. For added comfort, you can place the newborn against a towel lined with skin-safe bath paper. [title] Combine the shampoo and water in a small bowl.

Rendered prompt

[header] How to wash newborn hair [title] Rinse your newborn's hair for the first bath. [step] Because the hair and skin of a newborn are so delicate and sensitive, you don't actually need to use shampoo or cleansers. For your newborn's first bath, just pour a little plain water over their scalp.

Model answer

Extracted: C · Gold: C

Incorrect samples

sample 4 · hellaswag:00008041fail · 0.0% · 3415ms · c715921c871a

Question

[header] How to get an adult male to use his manners [title] Consider what direct and indirect demands you have been making. [step] Reflect on the balance of personal demands you are making to your partner, and evaluate whether that balance is just. [title] Be prepared to evaluate your viewpoint of the situation as objectively as possible.

A. [title] First, determine whether your partner was accurate in his assessment of his sexual capabilities and responses. [title] Discuss there are a number of different ways that men in modern society can show discretion, professionalism, and safety in the absence of sexual restraint.
B. [step] You have also made a choice that your partner must want and must follow. When you record and evaluate your demands, be sure you are or need to be able to speak with those who are making them.
C. [title] Consider your own perspective, not personal. [step] Communicating your viewpoint will provide you with invaluable resources.
D. [step] Don't go into the impending talk convinced that anything he might say is either wrong or not heartfelt. That won't help to solve the problem.

Rendered prompt

[header] How to get an adult male to use his manners [title] Consider what direct and indirect demands you have been making. [step] Reflect on the balance of personal demands you are making to your partner, and evaluate whether that balance is just. [title] Be prepared to evaluate your viewpoint of the situation as objectively as possible.

Model answer

Extracted: A · Gold: D

sample 13 · hellaswag:00003095fail · 0.0% · 3418ms · 682abd21b892

Question

A man stands in the middle of a stadium. crowds

A. are gathered around him as he spins around.
B. watch and cheer the man on.
C. watch from behind a curtain.
D. flash around him.

Rendered prompt

A man stands in the middle of a stadium. crowds

Model answer

Extracted: A · Gold: B

sample 21 · hellaswag:00008766fail · 0.0% · 3416ms · a32e88c9134b

Question

[header] How to get natural looking waves [title] Wash or mist your hair. [step] When braiding your hair to create waves, it is best to begin with damp hair. You may achieve this in one of two ways : [substeps] Wash your tresses prior to braiding your hair.

A. Tie off the ends of the braid and then let the loose hair dry slightly before washing it. Gradually do this for all hair types-curly, straight, with some parting.
B. When you begin braiding your hair, do not pat your locks dry. Instead, work a small amount of shampoo (coconut hair shampoo) into your hands and massage the shampoo into your hair.
C. In one, slightly wet your hair, then mist it with water. In the other, leave your hair damp and then mist your tresses gently with the palm of your hand.
D. Mist your hair with lukewarm water. [title] Combat frizz and snarls.

Rendered prompt

[header] How to get natural looking waves [title] Wash or mist your hair. [step] When braiding your hair to create waves, it is best to begin with damp hair. You may achieve this in one of two ways : [substeps] Wash your tresses prior to braiding your hair.

Model answer

Extracted: B · Gold: D

sample 24 · hellaswag:00000385fail · 0.0% · 3420ms · df0d7fd9ff6f

Question

A small group of people are seen shaking hands and standing together followed by a man blowing a whistle and people running around. the people

A. then play a game of soccer with one another, running up and down the sandy field while coaches yell on the side and a goal is blocked.
B. continue to run around under the stage and close up of a man speaking and shows people playing a game.
C. move around with drums and hitting a tune in the middle of the gym and the man continues to blow a whistle and lead into more people walking past on the side.
D. continue riding around on skateboards as well as following behind one another as well as people speaking to the camera.

Rendered prompt

A small group of people are seen shaking hands and standing together followed by a man blowing a whistle and people running around. the people

Model answer

Extracted: D · Gold: A

sample 30 · hellaswag:00006161fail · 0.0% · 3794ms · 4279c2636d71

Question

[header] How to calm a vicious rabbit [title] Show the rabbit it's hurting you if it bites. [step] Make a sudden yelping sound or squeal when and if your rabbit bites or nips you. This will act as a kind of signal that you're in pain, and your rabbit will associate that with biting you.

A. Be sure to discourage biting due to its obedient nature. [substeps] You can also make it less likely that your rabbit will bite you by rubbing it against something, such as a tree trunk, a post (or other object), or a cat.
B. [substeps] Place the muzzle over your rabbit's mouth, blocking its airways, and apply firm but gentle pressure. Continue holding the muzzle in place until your rabbit calms down.
C. [substeps] Tiny nips can just be your rabbit's way of telling you to go away, or that you're bugging it. They're not trying to hurt you, just trying to let you know they don't want to be touched or handled.
D. [title] Scold the rabbit if it bites or nips you. [step] If the rabbit bites you, you should immediately bring it to a veterinarian for medical attention.

Rendered prompt

[header] How to calm a vicious rabbit [title] Show the rabbit it's hurting you if it bites. [step] Make a sudden yelping sound or squeal when and if your rabbit bites or nips you. This will act as a kind of signal that you're in pain, and your rabbit will associate that with biting you.

Model answer

Extracted: D · Gold: C

Qwen3.6-27B-NVFP4NVFP4unknown protocol · unknown agent

by Stewart_of_Mars · shard 9 · 7/1/2026, 1:09:18 PM · cmr23e0jm03gipi011als85mf

79.8%

296/371 correct · 5 correct traces · 5 incorrect traces

Download all shard traces (JSONL)

Correct samples

sample 1 · hellaswag:00001856pass · 100.0% · 4681ms · 7dc0758352f8

Question

A group of men stand on top of logs, chopping away. The chopping legend speaks to the camera again. a couple of men

A. are in a group near a river, talking.
B. stand on the log top wearing protective gear.
C. chop some wood until one of them jumps off the log and raises his hands up in excitement.
D. are hanging up rocks above a river.

Rendered prompt

A group of men stand on top of logs, chopping away. The chopping legend speaks to the camera again. a couple of men

Model answer

Extracted: C · Gold: C

sample 3 · hellaswag:00007942pass · 100.0% · 4677ms · a4d6d883f5bb

Question

[header] How to choose an alternative to a flower girl [title] Invite other members of the family to join your wedding party. [step] Instead of having young flower girls, consider using older women in your family, such as your grandma or great aunt. They'll love to be included in a special way, and the sight of your grandma throwing flower petals into the air is to sure to bring a smile to everyone's face.

A. [substeps] Older women can even invite their significant other as well. Just be sure the couple makes time to discuss the details in advance.
B. [title] Let everyone know you're having a flower girl themed party. [step] In several ways, this will make your party more fun.
C. [title] Read opinions from other members of the family. [step] Reading magazines and articles on the flower girl instead of relying on your friends for opinions can help you understand what happened that night in the picture.
D. [title] Give the role to a friend who volunteers. [step] One of your adult friends may find the idea of being a flower girl enchanting.

Rendered prompt

[header] How to choose an alternative to a flower girl [title] Invite other members of the family to join your wedding party. [step] Instead of having young flower girls, consider using older women in your family, such as your grandma or great aunt. They'll love to be included in a special way, and the sight of your grandma throwing flower petals into the air is to sure to bring a smile to everyone's face.

Model answer

Extracted: D · Gold: D

sample 5 · hellaswag:00005966pass · 100.0% · 4679ms · 5bb3a5ea3513

Question

[header] How to make resurrection rolls [title] Pour the milk into a bowl. [step] To activate the yeast, you need to mix it with a warm liquid. Add ½ cup (118 ml) of warm milk to the bowl of a stand mixer.

A. [substeps] For a longer process, you might want to run the yeast in the sink before proceeding. [title] Gradually stir the milk and mixture with a whisk.
B. Mix until the milk is very light (around 110 ml). [substeps] If the milk is too soft for this recipe, add 1/2 cup (120 ml) of sour cream.
C. The milk should be 105 ° f (41 ° c). [substeps] You can use 1% or 2% milk, but whole milk usually results in the best rolls.
D. [substeps] If you have a hand mixer, you can make the dough for the roll yourself. You'll only need to make 2 cups (500 ml) of milk.

Rendered prompt

[header] How to make resurrection rolls [title] Pour the milk into a bowl. [step] To activate the yeast, you need to mix it with a warm liquid. Add ½ cup (118 ml) of warm milk to the bowl of a stand mixer.

Model answer

Extracted: C · Gold: C

sample 6 · hellaswag:00003322pass · 100.0% · 4681ms · c48fcbbe29da

Question

[header] How to wear a sports bra [title] Select a sports bra made of moisture-wicking material. [step] You want your sports bra to be made of moisture-wicking material that is breathable. Most new sports bras these days contain technology to wick away sweat, making them the ideal choice for working out.

A. [title] Choose a sports bra made of breathable materials. [step] High-quality sports bras should not be too loose-fitting because this could send heat into the necessary space and encourage sweat from the hard to reach tissues in your body.
B. This makes them perfect for work or after sports practice. [substeps] In addition to sweat, sports bras also help to protect your arm and leg from heat, and can go well under shorts or other form-fitting gear.
C. [substeps] Not everyone has how much of the material they need for a workout in their bras. Less is more, as sport bras tend to last longer.
D. Try to steer clear of cotton, which tends to soak up moisture and stay wet. [substeps] Choosing a moisture-wicking material will also help to regulate the temperature of your body as you're working out.

Rendered prompt

[header] How to wear a sports bra [title] Select a sports bra made of moisture-wicking material. [step] You want your sports bra to be made of moisture-wicking material that is breathable. Most new sports bras these days contain technology to wick away sweat, making them the ideal choice for working out.

Model answer

Extracted: D · Gold: D

sample 7 · hellaswag:00002823pass · 100.0% · 8414ms · 18b765afcb41

Question

A black screen appears with red and white letters on it that say "discus throw example of correction of a delivery problem for a beginner". a young lady

A. takes her fist into the air and jumps to the right, fast, catching several discs in the air.
B. walks into a white room wearing a black jacket and reappears with an aluminum fork in her hand.
C. appears for the discus throw classroom at some point in between.
D. is now standing on a concrete circle and she begins swinging her red discus, then swings her body and then throws it.

Rendered prompt

A black screen appears with red and white letters on it that say "discus throw example of correction of a delivery problem for a beginner". a young lady

Model answer

Extracted: D · Gold: D

Incorrect samples

sample 2 · hellaswag:00008393fail · 0.0% · 4679ms · 95202ba8c559

Question

[header] How to use movies to help kids build character [title] Pick movies by age group. [step] While some younger kids are more mature, generally, you don't want a story line that will go right over their head. Similarly, you don't want to lose older kids by showing them movies that are too simplistic.

A. [substeps] Often, the best movie lines are set in the last century. While you may have super teens, you can find lots of great movies for kids online.
B. If possible, choose movies by kids you love, not kids they already like. [substeps] For instance, in the movie audrey hepburn 2 2000, the tables at breakfast [[ (18.
C. Try to pick something that works for the age the kid is at. [substeps] For kids 2 to 7 years old, make sure the story is fairly simple, and that plot supports the idea you're trying to get across, as it needs to be glaringly obvious.
D. Most of the time, kids get what they want early on. Take your age group into consideration as you come up with a few choices for your movie series.

Rendered prompt

[header] How to use movies to help kids build character [title] Pick movies by age group. [step] While some younger kids are more mature, generally, you don't want a story line that will go right over their head. Similarly, you don't want to lose older kids by showing them movies that are too simplistic.

Model answer

Extracted: D · Gold: C

sample 4 · hellaswag:00007936fail · 0.0% · 4676ms · 661a83c3c67a

Question

[header] How to care for your skin before and after a brazilian wax [title] Prepare 24 hours before wax. [step] The day before your wax, exfoliate the area. You don't need to use a special scrub.

A. You can exfoliate regularly with a loofah, facial bar, or regular washcloth. [substeps] Do not exfoliate right before you wax your brows.
B. All you need is a washcloth or bath pouffe with soap. Make sure to do this at least a day before your wax.
C. [substeps] As you rub on and dry your skin afterwards, avoid washing with soap or body wash. [title] Exfoliate with a facial cleanser.
D. [substeps] To get the most benefit, scrub with a pumice stone. If you are not exfoliating, begin by exfoliating with a pumice stone the day before wax.

Rendered prompt

[header] How to care for your skin before and after a brazilian wax [title] Prepare 24 hours before wax. [step] The day before your wax, exfoliate the area. You don't need to use a special scrub.

Model answer

Extracted: D · Gold: B

sample 8 · hellaswag:00008354fail · 0.0% · 4677ms · 7baba6e236e5

Question

[header] How to eat pumpkin seeds [title] Preheat oven to 375 ° f (190 ° c). [title] Separate any pumpkin seeds from the pumpkin flesh. [step] The best way to do this is by hand, or perhaps by using an old (cleaned) comb that you wouldn't mind getting dirty.

A. [title] Spread a sticky oil over this pumpkin and spread this onto your hands. [step] Allow to absorb, or sit for longer if you'd like.
B. Use the comb to separate the pumpkin seeds from the fibrous, fleshy parts. [title] If you choose to, clean the outer shell of the pumpkin seeds.
C. [title] Pour 1 teaspoon of pumpkin flesh into a glass. [step] Alternatively, you can use a spoon to gently separate the pumpkin seeds from the pumpkin flesh.
D. Rinse all pumpkin seeds thoroughly in a warm and soapy sink and pat dry with paper towels. [title] Scoop the pumpkin seeds into the mini muffin muffin pans or mini paper baskets.

Rendered prompt

[header] How to eat pumpkin seeds [title] Preheat oven to 375 ° f (190 ° c). [title] Separate any pumpkin seeds from the pumpkin flesh. [step] The best way to do this is by hand, or perhaps by using an old (cleaned) comb that you wouldn't mind getting dirty.

Model answer

Extracted: C · Gold: B

sample 10 · hellaswag:00006530fail · 0.0% · 4678ms · 0486300da8a0

Question

[header] How to prepare for a photographic trip [title] Bring a camera or cameras. [step] If at all possible, it is a good idea to have a backup. If you have some type of mechanical or power failure, you will not be able to take any pictures without a backup.

A. You will be kneeling in front of a 75 watt bulb and your camera and a tripod might end up distracting you from viewing the landscape. [substeps] If you are planning to take a portrait, you will want to have a photograph of yourself and someone else and a camera that allows you to take screenshots or arrows.
B. [title] Be sure that you have a flash or a strobe. [step] Your camera may have that and if you are used to using that on camera flash, then use it.
C. A plastic camera can often deal with this problem. If you cannot bring a camera, you may want to ask a family member or friend to hold a spare camera for future use.
D. So make sure you have a backup ready. [substeps] You can either purchase or borrow a brush to use during your trip.

Rendered prompt

[header] How to prepare for a photographic trip [title] Bring a camera or cameras. [step] If at all possible, it is a good idea to have a backup. If you have some type of mechanical or power failure, you will not be able to take any pictures without a backup.

Model answer

Extracted: C · Gold: B

sample 20 · hellaswag:00009186fail · 0.0% · 8414ms · ec454149f677

Question

[header] How to make almond milk tea boba [title] Order or buy tapioca pearls. [step] If you do not have an asian market in your town, they can be found easily online by conducting a search engine inquiry. You may also find them at tea shops in some cities.

A. [substeps] The tapioca pearls are not as exotic as the other types. Some countries and regions offer bulbasaur pearls, which are available for purchase through costco and zapius.
B. Make sure to order whole " using this fresh grind " tapioca pearls as they contain ethylene and vitamin c, which act to make beets. [title] Cut the boba pearls into small pieces.
C. [substeps] Tapioca pearls may even come with an ingredient called tapioca coal. Purchase kosher walnut kernel pearls if you want to make almond milk tea.
D. [substeps] Do not buy quick-cooking boba, for best results. This tapioca is usually used as a thickener for soups, casseroles and stews.

Rendered prompt

[header] How to make almond milk tea boba [title] Order or buy tapioca pearls. [step] If you do not have an asian market in your town, they can be found easily online by conducting a search engine inquiry. You may also find them at tea shops in some cities.

Model answer

Extracted: C · Gold: D

Qwen3.6-27B-NVFP4NVFP4unknown protocol · unknown agent

by Stewart_of_Mars · shard 8 · 7/1/2026, 1:07:57 PM · cmr23c9wy035tpi019t62ajda

83.6%

310/371 correct · 5 correct traces · 5 incorrect traces

Download all shard traces (JSONL)

Correct samples

sample 1 · hellaswag:00008131pass · 100.0% · 4404ms · a4ac87a8acc9

Question

[header] How to flex lats [title] Learn how to isolate your lats when flexing. [step] Your lats are a pair of wide muscles that run from your armpits to your lower spine, wrapping around the back of your ribcage just beneath your shoulder blades. To figure out how to isolate your lats, first stand with your feet together.

A. Your toes should be together, with your butt facing downward. Close your abdominal muscles in the groin to get into this position.
B. Hold your ankles firmly in your right hand and your ankles together with your left hand with your hands overlapped, in a criss cross pattern of parallel and vertical lines. [title] Use a ruler to identify your lats.
C. Squat forward so that your back and front ribs are aligned and your toes are parallel with the floor. Raise your elbows at an angle and bend your elbow at 90 degrees.
D. Arch your lower back so that your chest is pushed forward and your butt is pushed out. Spread your upper back by lifting your arms slightly away from your sides and holding your hands about 6 inches (15 cm) in front of your hips.

Rendered prompt

[header] How to flex lats [title] Learn how to isolate your lats when flexing. [step] Your lats are a pair of wide muscles that run from your armpits to your lower spine, wrapping around the back of your ribcage just beneath your shoulder blades. To figure out how to isolate your lats, first stand with your feet together.

Model answer

Extracted: D · Gold: D

sample 3 · hellaswag:00009876pass · 100.0% · 4400ms · 019b1852fb0c

Question

[header] How to show cleavage [title] Wear tops that show off your cleavage. [step] While this may seem like a given, the right top is absolutely essential when showing cleavage. You can't show something that's covered up.

A. Also, make sure your tops are not g-rated, meaning those aren't appropriate for a particular time of day, or if you have an event coming up soon. [substeps] If you love showing cleavage, you want to show off that.
B. Make sure you're comfortable with your body in the top so you can show it up. [substeps] Studies show that flat strapless tops are considered the sexiest choice for showing cleavage.
C. While a turtleneck sweater may make your bust look fantastic, it will hide your cleavage. Make sure what you're wearing is cut for visible cleavage.
D. Ask a friend if she'd be okay with a top showing their cleavage, and if she says yes! [title] Go for a sleek cut. [step] The summer and winter months are the ideal time to show your cleavage.

Rendered prompt

[header] How to show cleavage [title] Wear tops that show off your cleavage. [step] While this may seem like a given, the right top is absolutely essential when showing cleavage. You can't show something that's covered up.

Model answer

Extracted: C · Gold: C

sample 4 · hellaswag:00003669pass · 100.0% · 8581ms · a1faa682bdfb

Question

[header] How to enjoy your birthday [title] If you like planning, then plan how you'll spend the day at least a week ahead. [step] Check that your friends and family are free for the day and make sure you choose things to do that can involve the people you'd like to spend time with. [title] Get plenty of sleep.

A. [step] A really good time to get a full night's sleep is about 7 am; the rest of the night should be uninterrupted. It is also good to unwind a bit in the evening and can be done so that you can think about each day.
B. [step] And make sure to get 7-9 hours of sleep a night-1 1/2 hours for adults and 4-9 hours for children. You can even make it up to four hours of sleep on weekends.
C. [step] The day before your birthday, go to bed at a reasonable hour so that you can get the sleep needed to guarantee that you'll have lots of energy on your birthday. [title] Treat yourself to an awesome birthday outfit.
D. [step] The best things you can do for a birthday is to get a good quality sleep. Your body clock is only going to be able to tell you the exact amount of sleep you need.

Rendered prompt

[header] How to enjoy your birthday [title] If you like planning, then plan how you'll spend the day at least a week ahead. [step] Check that your friends and family are free for the day and make sure you choose things to do that can involve the people you'd like to spend time with. [title] Get plenty of sleep.

Model answer

Extracted: C · Gold: C

sample 5 · hellaswag:00005144pass · 100.0% · 4402ms · 73c837dec8a2

Question

[header] How to succeed in distance learning [title] Familiarize yourself with the syllabus. [step] Think of your syllabus as the blueprint for your class. It should contain all of the class guidelines and rules, as well as what your instructor expects from you over the course of the semester.

A. If you're having trouble understanding all of the syllabus instructions, ask your instructor for advice. [substeps] To become a successful athlete, you have to have a strong understanding of the game and field, and then, with that knowledge, bring out your best on a single outing.
B. Include these with your notes, papers, assignments and applications so you'll be up to date and able to fully experience your classroom. [substeps] Do a quick background check on your syllabus for your instructor.
C. Bring this information to the syllabus, stating all required daily requirements, as well as any tests you may have scheduled or deadlines you must meet for your session. This way, you can easily adjust any assignments you may finish before the next session starts.
D. Make sure you read through this document in its entirety and keep track of it all semester. [substeps] Ask your instructor any questions you might have about the syllabus.

Rendered prompt

[header] How to succeed in distance learning [title] Familiarize yourself with the syllabus. [step] Think of your syllabus as the blueprint for your class. It should contain all of the class guidelines and rules, as well as what your instructor expects from you over the course of the semester.

Model answer

Extracted: D · Gold: D

sample 7 · hellaswag:00006318pass · 100.0% · 4399ms · 8562f92e036c

Question

[header] How to get other guys to stop staring at your pretty wife [title] Get your wife's opinion. [step] Value how your wife feels about the situation and get an understanding of how if affects her. If she is bothered by the stares, discuss whether or not you should intervene on her behalf.

A. If you feel the stares are a direct threat, try to think about your wife's feelings. Most of all, give her the benefit of the doubt-no guy wants to get a first-degree stare.
B. If she says " i don't feel like i need your attention right now, " then you can bring it up with her and let her ask her opinions. [substeps] Try to use humor to break the ice.
C. [substeps] If you start to think her boyfriend is doing this, report it back to him. If he is only doing this to steal her attention, admit that you feel sorry for him.
D. [substeps] If you have addressed this issue in the past, then you may already know how your wife feels about the attention that she receives. If this is the case, make a decision in advance on how you will respond so that you do not overreact.

Rendered prompt

[header] How to get other guys to stop staring at your pretty wife [title] Get your wife's opinion. [step] Value how your wife feels about the situation and get an understanding of how if affects her. If she is bothered by the stares, discuss whether or not you should intervene on her behalf.

Model answer

Extracted: D · Gold: D

Incorrect samples

sample 2 · hellaswag:00009130fail · 0.0% · 4405ms · 493709792d71

Question

[header] How to apply makeup for a glamour photography shoot [title] Clean your face. [step] Always start with a clean face. Leave no traces of your previous makeup.

A. If you will use a pore-cleaning strip, do it the day before to avoid makeup get caught in your newly cleansed pores. Exfoliate to feel that silky smooth texture of your skin.
B. Do not blush, define, or dry dry your face before applying eyeliner in this photoshoot. [title] Apply foundation and concealer to your face.
C. Inspect your face for blemishes with a makeup primer or something like that. Use a gentle cleanser and carefully pat your face dry with a soft soft cloth.
D. A healthy glow has been built up in your face over the past day. In order to achieve the glow, begin by exfoliating your face with a gentle face cleanser.

Rendered prompt

[header] How to apply makeup for a glamour photography shoot [title] Clean your face. [step] Always start with a clean face. Leave no traces of your previous makeup.

Model answer

Extracted: D · Gold: A

sample 6 · hellaswag:00005768fail · 0.0% · 4405ms · a8fb349bdd6f

Question

[header] How to make spicy chicken bites [title] Preheat the oven to 450º fahrenheit (230º celsius). [title] Line a large baking sheet with parchment paper. [title] Combine the chicken bites with the garlic and paprika.

A. [title] Season the chicken bites with salt and freshly ground black pepper. [title] Stir together the chickpeas, seasoning with nutmeg, currants, cumin and dried ginger in a mixing bowl.
B. [step] Toss the ingredients together in a large bowl until fully coated. [title] Toss in the flour and fold together until well-coated.
C. [title] Stir in ground beef and chicken bouillon and stir. [title] Add the oil and cook it for 2 minutes.
D. [title] Spread the chicken bites over the coated baking sheet. [title] Bake until they've cooked.

Rendered prompt

[header] How to make spicy chicken bites [title] Preheat the oven to 450º fahrenheit (230º celsius). [title] Line a large baking sheet with parchment paper. [title] Combine the chicken bites with the garlic and paprika.

Model answer

Extracted: D · Gold: B

sample 8 · hellaswag:00004787fail · 0.0% · 4399ms · fa5c0abbe527

Question

[header] How to date a guy from another school [title] Schedule your dates in advance. [step] It's easy to get wrapped up in daily life at school. Since you won't see your guy everyday, make sure to schedule time for each other after school or on weekends.

A. That way, you have a few days free to do something nice together and check in on each other. [substeps] You can also ask your guy when he needs something and send him your numbers.
B. Once school starts, prioritize being with him and avoiding being alone together. Try to schedule outings with his friends if you're in school together.
C. Work out your other commitments so that you get to see him at least once a week, if not more. Having a plan to see each other will give you something to look forward to and make the relationship exciting.
D. [substeps] Even if your guy spends all day on school, try scheduling dates with your friends. You can take advantage of your free time to do something fun together, like have a swim, walk your dog, or watch a play or movie.

Rendered prompt

[header] How to date a guy from another school [title] Schedule your dates in advance. [step] It's easy to get wrapped up in daily life at school. Since you won't see your guy everyday, make sure to schedule time for each other after school or on weekends.

Model answer

Extracted: D · Gold: C

sample 10 · hellaswag:00004482fail · 0.0% · 4399ms · 37232b0902b6

Question

[header] How to annoy your roommate [title] Leave trash around the room. [step] First of all, this will make your room smell bad. Secondly, it'll attract flies, ants, and other bugs, which will gross your roommate out.

A. She will now be forced to clean up your mess or will leave it there, and it will annoy her. [substeps] Avoid leaving anything you might also find gross.
B. Leave trash around the room until it's gone! Remind your roommate to check their room frequently for trash. Finally, move a few boxes that are not worth moving around, from your room all the way to another part of the house to make room for trash.
C. If you're really into it, leave garbage around your walls and furniture and knock around some boxes. [substeps] Alternatively, if you're in a particularly bad mood, you could hang out with a friend who's unlikely to find trash, like a teddy bear.
D. No one likes to be around these things anyway, so if you act like an annoyance right from the start, people will start to go around and pick up a lot of clutter. [substeps] Take a small trash can and fill it full of personal trash, such as toilet paper, empty paint cans, and trash bags.

Rendered prompt

[header] How to annoy your roommate [title] Leave trash around the room. [step] First of all, this will make your room smell bad. Secondly, it'll attract flies, ants, and other bugs, which will gross your roommate out.

Model answer

Extracted: B · Gold: A

sample 13 · hellaswag:00008048fail · 0.0% · 5041ms · c074de6e1eea

Question

[header] How to be a strict mom [title] Make it clear what the rules are. [step] Do not expect your children to read your mind. When you set rules, explain them to the entire household.

A. Enforce your limits and rules in certain ways. For example, if your child always goes to school without adult supervision, make it clear that this behavior happens every day and is to be expected.
B. For example, if there are four children in the house, explain that children should only have access to the least amount of supervision. If children are sick or injured, they should be allowed in a daycare.
C. You can even post certain rules on the fridge. This will help your children understand how to act in the house and why you are disciplining them when they get in trouble.
D. Ask your children to follow the rules one way and then follow them the other way. Make sure that everyone understands that there are consequences when the rules are ignored.

Rendered prompt

[header] How to be a strict mom [title] Make it clear what the rules are. [step] Do not expect your children to read your mind. When you set rules, explain them to the entire household.

Model answer

Extracted: D · Gold: C

Qwen3.6-27B-NVFP4NVFP4unknown protocol · unknown agent

by Stewart_of_Mars · shard 7 · 7/1/2026, 1:06:35 PM · cmr23aiz202v4pi01gsi96f7r

80.1%

297/371 correct · 5 correct traces · 5 incorrect traces

Download all shard traces (JSONL)

Correct samples

sample 1 · hellaswag:00007843pass · 100.0% · 5585ms · 2b00dd2efd79

Question

[header] How to cite a book [title] Familiarize yourself with the accepted citation styles. [step] Use only one style when citing a book reference-each style has very specific rules about capitalization, punctuation and placement of the facts. That said, all styles are designed for the same purpose: to give proper credit where credit is due.

A. It is not necessary to include more than your attention span. You could include reading journal articles (if they were used before) or writing your own start point book reviews (if they are not never published by the author).
B. Listed below are the most common styles : [substeps] Modern language association (mla) style. This is a citation style most often used within the liberal arts and humanities departments of universities.
C. [substeps] Use a capital letter or a comma after the reference terms of a book reference. This will refer to the same format in a sentence-to-sentence format.
D. [substeps] There are several great writers' styles for reference: paul golding, glenwhelp, dell, k. paul wilson (1960s innovator), william investments investors, allan cannon (1938), bob n.

Rendered prompt

[header] How to cite a book [title] Familiarize yourself with the accepted citation styles. [step] Use only one style when citing a book reference-each style has very specific rules about capitalization, punctuation and placement of the facts. That said, all styles are designed for the same purpose: to give proper credit where credit is due.

Model answer

Extracted: B · Gold: B

sample 2 · hellaswag:00006278pass · 100.0% · 5583ms · c6f497c808d7

Question

[header] How to be emo in middle school [title] Learn what it means to be emo. [step] Before you can become emo, you have to understand exactly what it means. " emo " is short for emotive, and people who consider themselves to be emo are usually in touch with their emotions.

A. They don't feel the need to hide how they feel, even when they are sad or angry. [substeps] While music, clothing, hair, and makeup can play a large role in being emo, being comfortable with your feelings and expressing them is the most important part.
B. This is an important step to emo, so have a good understanding of your gender. You don't have to have what everyone else has.
C. You don't need a dictionary dictionary and stick with the " emotive ". Instead, learn what emotive means, like " i am ", " i am skinny ", " i am strong and strong ", " i'm confident in my body ", etc.
D. You can't be emo without being emo. [substeps] If you're emo, you want an emo vibe that's unique to you.

Rendered prompt

[header] How to be emo in middle school [title] Learn what it means to be emo. [step] Before you can become emo, you have to understand exactly what it means. " emo " is short for emotive, and people who consider themselves to be emo are usually in touch with their emotions.

Model answer

Extracted: A · Gold: A

sample 3 · hellaswag:00004307pass · 100.0% · 5582ms · 0729af895014

Question

[header] How to make a smoke bomb [title] Gather three or four ping-pong balls. [step] For this method, you don't have to buy any extra ingredients other than basic household supplies, like aluminum foil, a pencil, screwdriver, and scissors. [title] Cut a hole in the top of one of the balls.

A. [step] You can poke a hole with a screwdriver or a knife. [title] Cut the other balls into small pieces and put them in the hole.
B. [step] Use a knick knack knife to create an x around one of the balls. [title] Punch a hole in the bottom of one of the balls.
C. [step] Align the hole in the top ball with the one that's down. [substeps] The hole needs to be about six-ten inches deep and about two inches wide.
D. [step] Grab a piece of foil about 15 inches from the edge of the ball. Get a zig-zag or intricate pattern.

Rendered prompt

[header] How to make a smoke bomb [title] Gather three or four ping-pong balls. [step] For this method, you don't have to buy any extra ingredients other than basic household supplies, like aluminum foil, a pencil, screwdriver, and scissors. [title] Cut a hole in the top of one of the balls.

Model answer

Extracted: A · Gold: A

sample 4 · hellaswag:00003431pass · 100.0% · 5582ms · 727d315e3ed2

Question

[header] How to apply shadow on hooded eyes [title] Prime your eyelids. [step] While this is important for everyone, it's especially critical for those with hooded eyes. Primer creates a base for your makeup that helps it stay put and last all day.

A. [substeps] If you use liquid primer, try it first before you start applying it on your eyes. Otherwise, you'll try a different primer beforehand.
B. Primer also keeps your eyelids from getting heavy and swollen. To prime your eyelids : [substeps] Apply an eyeshadow to your lids using a dark gray eye liner or gel instead of a cream or gel liner.
C. Because hooded eyes are prone to smudging and smearing, primer can make all the difference. [substeps] Use your fingertip to apply the primer, and allow it to absorb into your skin for a minute or so before proceeding.
D. Apply a primer to your eyelids and eyelid over the primer you applied earlier. [substeps] Use your powder brush, primer, or a makeup sponge to carefully prime your eyelids.

Rendered prompt

[header] How to apply shadow on hooded eyes [title] Prime your eyelids. [step] While this is important for everyone, it's especially critical for those with hooded eyes. Primer creates a base for your makeup that helps it stay put and last all day.

Model answer

Extracted: C · Gold: C

sample 5 · hellaswag:00004697pass · 100.0% · 6999ms · 913f8d7c63a6

Question

[header] How to get favorites on your deviantart photo [title] Create your own username. [step] Have a name that can be easily recognized and defines you as a person. [substeps] Your username doesn't have to be your first name, but some well-known deviantart members remember their art style through their da names.

A. Avoid using names of famous celebrities or character from movies, video games, or animes. To change your username, you have to purchase premium membership.
B. If you don't want people to recognize you as a person, try to keep the name honest and easy to identify. You can do this by including your full name and what year it is, but stay away from androgynous names.
C. For instance, president christian founder of the white house had a username that " peter " (modern), and there are so many other names available today. In the wild, some names are simply variations of the last name.
D. [title] Create usernames for your favorite characters. [step] Depending on your theme and design, you may have many different options.

Rendered prompt

[header] How to get favorites on your deviantart photo [title] Create your own username. [step] Have a name that can be easily recognized and defines you as a person. [substeps] Your username doesn't have to be your first name, but some well-known deviantart members remember their art style through their da names.

Model answer

Extracted: A · Gold: A

Incorrect samples

sample 6 · hellaswag:00002238fail · 0.0% · 6998ms · fec39366eba5

Question

The man lays down and had a hard time breathing. After putting down sand he drinks a warm beverage. we

A. see the man underwater swimming.
B. see the ending credits screen.
C. then see two guys swimming nak * d and drinking.
D. see no other man and the other man is shown shirtless.

Rendered prompt

The man lays down and had a hard time breathing. After putting down sand he drinks a warm beverage. we

Model answer

Extracted: A · Gold: B

sample 19 · hellaswag:00000609fail · 0.0% · 5583ms · 0df9993b1b59

Question

A man and a woman are in a ring wrestling while a referee is standing in the corner of the ring along with a woman holding a wrestling belt. man and woman

A. are doing wrestling movements in a ring in a dark room and a referee is waching and screaming at them.
B. are wrestling in punches and kicking in a ring of mats in front of judges.
C. are holding a wrestling pole as the referee is standing on the other side of the ring.
D. are standing on opposite ends of the ring holding each other's belts in their hands while lifting each other up in the air upside down.

Rendered prompt

A man and a woman are in a ring wrestling while a referee is standing in the corner of the ring along with a woman holding a wrestling belt. man and woman

Model answer

Extracted: C · Gold: A

sample 26 · hellaswag:00007917fail · 0.0% · 6998ms · 3cede662d957

Question

[header] How to raise children successfullly [title] Help children learn about nature. [step] Teaching children to love and not fear nature is a vital part of teaching children that this a larger world than their bedroom and games station. It will give them the ability to feel a connection with something other than the house, school and local park.

A. [substeps] The elementary curriculum requires children to learn about nature from birth, and learn how to identify it and understand it. Even at a young age, children become proficient at discovering and appreciating nature and the way it feels to them.
B. It will also be a learning experience for young children. [substeps] Find new ways to enjoy outdoor sports.
C. Pack a couple sandwiches, take the kids into the woods and hike a deer trail through the woods. As you go : [substeps] Let the kids ask questions.
D. Many children like hiking, camping, wildlife photography, or nature watching, so they will enjoy the outdoors with you. [title] Show gratitude for your friends, family, and fellow caregivers.

Rendered prompt

[header] How to raise children successfullly [title] Help children learn about nature. [step] Teaching children to love and not fear nature is a vital part of teaching children that this a larger world than their bedroom and games station. It will give them the ability to feel a connection with something other than the house, school and local park.

Model answer

Extracted: D · Gold: C

sample 28 · hellaswag:00008694fail · 0.0% · 5583ms · f3f93fd4c75e

Question

[header] How to catch a cheating partner [title] Notice if your partner spends less time with you. [step] Your partner may develop sudden new interests or friends if he or she is cheating. He or she may begin working longer hours.

A. He or she may have more commitments such as sports games and nights out. [substeps] Your partner may try to make as big of an impact on your life as possible.
B. If you ask your partner where he or she is going, your partner may give you a vague answer such as " out. " or " meeting up with some people.
C. You may notice that your partner spends more time with you when you are not together. Someone who is cheating may spend more time than the other person.
D. [substeps] Your partner may be dating other girls instead of spending time with you. He or she may not get extra money to spend with you.

Rendered prompt

[header] How to catch a cheating partner [title] Notice if your partner spends less time with you. [step] Your partner may develop sudden new interests or friends if he or she is cheating. He or she may begin working longer hours.

Model answer

Extracted: C · Gold: B

sample 34 · hellaswag:00006486fail · 0.0% · 6548ms · 36fbd6541666

Question

[header] How to make a bun for short hair [title] Gather your hair into a high ponytail. [step] Use a comb to gather your hair high on your head, either right on top or slightly lower down, according to what you enjoy. Secure your hair with a hair elastic.

A. Make sure it is tightly secured, so your hair won't fall out. [substeps] You might want to use some bobby pins or hair spray to keep your hair in place at the top of your head.
B. [title] Layer two layers of hair on top of your head. [step] Secure two layers of hair on top of your head, at equal locations-there should be at least 2 layers of hair on each side of your head.
C. [substeps] If you prefer a retro look, then gather your hair into a high ponytail and twist. If you prefer a more traditional hairstyle, then gather your hair into a low bun (very nice, but not preppy style) and secure with a hair elastic.
D. [title] Gather your hair into a low ponytail. [step] You may ask the person at the hair salon to split your hair into three sections.

Rendered prompt

[header] How to make a bun for short hair [title] Gather your hair into a high ponytail. [step] Use a comb to gather your hair high on your head, either right on top or slightly lower down, according to what you enjoy. Secure your hair with a hair elastic.

Model answer

Extracted: D · Gold: A

Qwen3.6-27B-NVFP4NVFP4unknown protocol · unknown agent

by Stewart_of_Mars · shard 6 · 7/1/2026, 1:05:11 PM · cmr238qh602kdpi01t407b2tp

80.1%

297/371 correct · 5 correct traces · 5 incorrect traces

Download all shard traces (JSONL)

Correct samples

sample 1 · hellaswag:00000731pass · 100.0% · 4135ms · 0190bd41cc1f

Question

Maria is demonstrating how to make a nyquil cocktail drink. she

A. shows the ingredients required for the drink.
B. pours it through a straw into a glass and washes it down with a cup of soda water.
C. shows each scale of the cocktail by adding different colored liquids.
D. pours the ice in a small glass and squirts lemon juice into the glass.

Rendered prompt

Maria is demonstrating how to make a nyquil cocktail drink. she

Model answer

Extracted: A · Gold: A

sample 2 · hellaswag:00006203pass · 100.0% · 4124ms · 3f0139f1e050

Question

[header] How to zone agricultural land for use as a wedding venue [title] Talk to your neighbors. [step] Your neighbors can be your strongest supporters or your most bitter rivals if you want to rezone your property. The best way to potentially get them on your side is to keep them informed of your plans every step of the way.

A. [substeps] In the early stages, let them know about your plans and reassure them that you will keep them informed on developments. Give them the chance to ask questions, and do everything you can to assure them that events you host won't cause any disturbances to them or their property.
B. [substeps] Make sure you know your neighbors' lives and can help them out if they disagree about your plan. Give them written correspondence so they can be helpful to you if needed.
C. This means that if your neighbors are anything but informed, you should talk to them as well. If you don't know them well enough to keep detailed notes, reach out to them and ask them to help keep a record of your plans.
D. [substeps] Don't forget that some of your family's acreage may need transplant rights. Always decide ahead of time what kind of liens will be put on what land you are planning on and why.

Rendered prompt

[header] How to zone agricultural land for use as a wedding venue [title] Talk to your neighbors. [step] Your neighbors can be your strongest supporters or your most bitter rivals if you want to rezone your property. The best way to potentially get them on your side is to keep them informed of your plans every step of the way.

Model answer

Extracted: A · Gold: A

sample 4 · hellaswag:00009360pass · 100.0% · 4385ms · bfa9c933b4b0

Question

[header] How to get ready for a party [title] Prepare anything you need to bring. [step] If the party is a potluck, you'll want to make and bring a dish to share. If the party is a birthday party or some kind of shower, you'll need to bring a gift.

A. If you are throwing a birthday party or if your parents are fine with it, feel free to bring these items. [substeps] In a party, use sachets or strings to write things in.
B. If the party is a dinner party, you should bring a bottle of wine or a hostess gift. At more casual parties in high school or college, it's generally expected that guests won't bring anything unless it's been specified on the invitation.
C. You'll also want to bring a good notebook and a pen so you can write down ideas on how to best decorate the house. A classic approach would be to bring plastic utensils, such as wine glasses.
D. [title] Know what you're expected to wear. [step] If you're just going to school, you might wear a dress, but if you're doing other things, such as going to a dance, you might wear some pajamas or something a little fancier like a dress.

Rendered prompt

[header] How to get ready for a party [title] Prepare anything you need to bring. [step] If the party is a potluck, you'll want to make and bring a dish to share. If the party is a birthday party or some kind of shower, you'll need to bring a gift.

Model answer

Extracted: B · Gold: B

sample 5 · hellaswag:00004111pass · 100.0% · 4124ms · b6a1d249ead7

Question

[header] How to style mom jeans [title] Pick a pair of jeans that fit your waist well. [step] Mom jeans are meant to sit high on your waist. When picking out a new pair, try them on and see how the waist fits-you don't want the jeans too tight or too loose.

A. Try on any pair and see if it's even snug. [substeps] If your mom jeans tend to dig into your legs, look for jeans that come up above the knee.
B. [substeps] Pick out jeans that are at the leg length of your legs. If you have shorter legs, opt for what looks best to you, rather than what might look nice with longer pants.
C. [substeps] The high waist will show off your waist and elongate your legs. [title] Choose a leg style that's appropriate for your figure.
D. Try buying pairs that are so low that you can still walk in them but they are a little looser. [substeps] Ask your mom about how to size your jeans.

Rendered prompt

[header] How to style mom jeans [title] Pick a pair of jeans that fit your waist well. [step] Mom jeans are meant to sit high on your waist. When picking out a new pair, try them on and see how the waist fits-you don't want the jeans too tight or too loose.

Model answer

Extracted: C · Gold: C

sample 6 · hellaswag:00009418pass · 100.0% · 4135ms · 3accc12092a0

Question

[header] How to diagnose heel spurs [title] Locate the pain. [step] Heel spurs can show up in multiple places on your heel. This can cause the pain to be slightly different depending on exactly where the heel spur is.

A. It can be hard to judge the strength of the spur and whether or not it is painful at all. When you just try to be more specific about where it is, your foot and heel will become increasingly entangled and there will be no wiggle room.
B. They can be located at the back of the heel or under the heel, near the sole of your foot. If you are experiencing pain along the back of your foot, up through your ankle, you might have a heel spur on the back of your heel.
C. The first position you identify is the thigh. You will see white or pinkish swelling and inflammation along the arch of the heel.
D. The most common places you might see the sensation of the pain are your heels and your nail. [substeps] You may discover that the pain is more acute in one heel then it is in another.

Rendered prompt

[header] How to diagnose heel spurs [title] Locate the pain. [step] Heel spurs can show up in multiple places on your heel. This can cause the pain to be slightly different depending on exactly where the heel spur is.

Model answer

Extracted: B · Gold: B

Incorrect samples

sample 3 · hellaswag:00002742fail · 0.0% · 4135ms · 471f69d3a29f

Question

. a man with a orange shirt and blue gloves

A. is standing in a snow covered parking lot talking while holding a skateboard and talking.
B. is doing gymnastics on an elliptical.
C. is shown operating a fire torch machine.
D. approaches a green motor vehicle that is in a pool.

Rendered prompt

. a man with a orange shirt and blue gloves

Model answer

Extracted: A · Gold: C

sample 9 · hellaswag:00000155fail · 0.0% · 4135ms · 8ee692ee5d56

Question

The harmonica is held up and shown. a person

A. holds a harmonica between his hands a plays a song.
B. is flying through the sky.
C. runs up and grabs the guitar, then begins strumming.
D. is strumming an electric guitar.

Rendered prompt

The harmonica is held up and shown. a person

Model answer

Extracted: D · Gold: A

sample 10 · hellaswag:00002117fail · 0.0% · 4137ms · 0684eb13b568

Question

The woman points to a water splash on the counter next to the sink after turning on the water at the sink. the woman

A. takes out a spray bottle and a contact lens.
B. talks to the camera and washes her hands before rinsing her hands.
C. turns on the water with a giant sponge, grabs soap and adds some drops of the water.
D. then grabs a paper towel and wipes down the sink counter, the inside of the sink and the faucet spout.

Rendered prompt

The woman points to a water splash on the counter next to the sink after turning on the water at the sink. the woman

Model answer

Extracted: B · Gold: D

sample 13 · hellaswag:00004337fail · 0.0% · 4132ms · 4c39260a00d0

Question

[header] How to advertise for wedding photography [title] Purchase a small newspaper advertisement in your local paper. [substeps] Start out by placing a print ad in your town's sunday paper. Newspaper advertisement returns have been on the decline for some time, but ad space can be relatively inexpensive depending on the paper.

A. If you're looking for outdoor photography, try local paintshops or area recreation areas. They also have online sites to let you advertise on a regular basis, such as forums.
B. Increase your ad space if you get a number of calls after your first attempt. This larger ad should provide an exemplary photograph and a range of contact information.
C. Local newspapers are also a great place to post a brief ad--whatever type of photo the print advertisement contains--for your most popular wedding photography locations. [title] Advertise your photography in local newspapers.
D. Customers walking by today are often drawn to newspapers where images are sold relatively cheaply. If you're a reviewer of the ads on the classified sections of your newspaper, sometimes the word " wedding " stands out.

Rendered prompt

[header] How to advertise for wedding photography [title] Purchase a small newspaper advertisement in your local paper. [substeps] Start out by placing a print ad in your town's sunday paper. Newspaper advertisement returns have been on the decline for some time, but ad space can be relatively inexpensive depending on the paper.

Model answer

Extracted: C · Gold: B

sample 16 · hellaswag:00001356fail · 0.0% · 4137ms · 3e8c97e1ca24

Question

A lady licks an ice cream cone and holds a baby. The lady takes the pacifier from the baby and lets the baby eat ice cream. the baby

A. eats pieces of the ice cream and mixes it in a bowl.
B. makes a face and stare at the camera as she eats.
C. isn't happy but he feeds it from the ice cream.
D. is getting more ice cream, and the lady keeps licking his face.

Rendered prompt

A lady licks an ice cream cone and holds a baby. The lady takes the pacifier from the baby and lets the baby eat ice cream. the baby

Model answer

Extracted: D · Gold: B