모델리더보드모델Marketplace평가학습렌탈API 문서
Language
Your Ad Here

Tech Greenpost

Official

A five-prompt creative writing eval where models draft short tech-related 4chan-style greenposts. DeepSeek judges format compliance, reasonable length, tech relevance, coherence, and humor.

Category: writingEval type: Custom server-sideVersion: v1.0Submitted by: Community
Custom eval: run it locally with the LocalMaxxing CLI and upload artifacts, or use POST /api/evals/execute if your OpenAI-compatible endpoint is publicly reachable.

Eval Details

Scoring
Llm Judge
Aggregation
Mean
Direction
Higher is better
Tasks
5 tasks

Default Run Config

TopP: 0.95Temperature: 0.8
TaskDatasetWeightShotsMax Tokens
GPU Driver Greenpost
gpu_driver_greenpost
1 inline item1Default450
Homelab Greenpost
homelab_greenpost
1 inline item1Default450
Package Manager Greenpost
package_manager_greenpost
1 inline item1Default450
Quantization Greenpost
quantization_greenpost
1 inline item1Default450
Debugging Greenpost
debugging_greenpost
1 inline item1Default450

Leaderboard— best run per model

#ModelScoreQuantHardware
Qwen3.6-27B
Qwen
91.0%
Q4_KMRTX 3090
2Qwen3.6-27B
Qwen
75.2%
IQ4_NLRTX 3090

Task Breakdown— top model

debugging_greenpost
95.0%
· 1 samples
sample 195.0% · ms · dacf479d68a3
Question
Write exactly one short technology-themed 4chan-style greenpost about the scenario below.

Scenario: A programmer debugs a CUDA or ROCm error for hours and discovers the problem was embarrassingly simple.

Requirements:
- 5 to 12 non-empty lines.
- Every non-empty line must start with > and must not have a space after >.
- Each non-empty line must match this pattern: ^>\S
- Reasonable length: 350 to 1200 characters total.
- Tech related, coherent, and readable as a tiny story.
- No title, no markdown, no bullets, no explanation outside the greenpost.
- Avoid slurs, harassment, sexual content, personal data, and actionable wrongdoing.

Just output the greenpost directly without any thinking or reasoning.
Model answer
>spent six hours staring at a hipLaunchKernelGGL segfault in my ROCm setup
>checked driver versions, validated memory alignment down to the byte, rewrote the kernel in PTX just to be sure
>thought it was a subtle bug with shared memory bank conflicts or async stream synchronization
>rebooted three times and even reinstalled amdgpu-hip-sdk from source
>finally added a printf inside the kernel and realized the grid dimensions calculated to zero blocks
>turns out I mixed up width and height in the ceil division macro
>kernel never launched, host memory was uninitialized garbage, segfault happened during validation on CPU side
>facepalm so hard my monitor wobbled
gpu_driver_greenpost
95.0%
· 1 samples
sample 195.0% · ms · d262abdb055a
Question
Write exactly one short technology-themed 4chan-style greenpost about the scenario below.

Scenario: A late-night GPU driver update breaks a local LLM inference setup right before a benchmark run.

Requirements:
- 5 to 12 non-empty lines.
- Every non-empty line must start with > and must not have a space after >.
- Each non-empty line must match this pattern: ^>\S
- Reasonable length: 350 to 1200 characters total.
- Tech related, coherent, and readable as a tiny story.
- No title, no markdown, no bullets, no explanation outside the greenpost.
- Avoid slurs, harassment, sexual content, personal data, and actionable wrongdoing.

Just output the greenpost directly without any thinking or reasoning.
Model answer
>mfw I spent three days quantizing my local llama model to fit on 24GB vram
>scheduled the benchmark suite for 3am to hit the grid while I sleep
>wake up to check results and find cuda errors everywhere in the logs
>nvidia pushed a silent "critical" driver update at 2:58am via windows update
>new driver version drops support for the specific compute capability my old 3090 relies on for flash attention
>tried rolling back but the installer corrupted the registry keys for wddm
>had to boot into linux live usb just to recover the checkpoint files before the scheduled cleanup script nuked them
>now waiting six hours for a fresh docker container build while my internet throttles during peak hours
>never trust microsoft's "recommended updates" when you're running bare metal inference workloads
>tfw your benchmark scores are now irrelevant because you missed the submission window
homelab_greenpost
85.0%
· 1 samples
sample 185.0% · ms · 2978df92756b
Question
Write exactly one short technology-themed 4chan-style greenpost about the scenario below.

Scenario: A home lab NAS, router, and self-hosted dashboard all start failing during a simple cable cleanup.

Requirements:
- 5 to 12 non-empty lines.
- Every non-empty line must start with > and must not have a space after >.
- Each non-empty line must match this pattern: ^>\S
- Reasonable length: 350 to 1200 characters total.
- Tech related, coherent, and readable as a tiny story.
- No title, no markdown, no bullets, no explanation outside the greenpost.
- Avoid slurs, harassment, sexual content, personal data, and actionable wrongdoing.

Just output the greenpost directly without any thinking or reasoning.
Model answer
>Decided to finally tackle the spaghetti behind my rack after months of ignoring it.
>Pulled a single ethernet patch cord that looked loose near the bottom switch port.
>Three seconds later the UniFi controller dashboard vanished from my browser.
>Rushed over to check the router only to find the LED ring completely dark.
>Tried logging into the TrueNAS array but got nothing but connection refused errors.
>Realized I accidentally yanked the main power injector for the PoE switch instead of a data cable.
>Spent an hour reseating every connector and double checking the rack PDUs.
>Lesson learned: always trace cables to their source before pulling anything in a home lab.
package_manager_greenpost
85.0%
· 1 samples
sample 185.0% · ms · b51be5edae48
Question
Write exactly one short technology-themed 4chan-style greenpost about the scenario below.

Scenario: A harmless package manager upgrade turns a working developer environment into dependency chaos.

Requirements:
- 5 to 12 non-empty lines.
- Every non-empty line must start with > and must not have a space after >.
- Each non-empty line must match this pattern: ^>\S
- Reasonable length: 350 to 1200 characters total.
- Tech related, coherent, and readable as a tiny story.
- No title, no markdown, no bullets, no explanation outside the greenpost.
- Avoid slurs, harassment, sexual content, personal data, and actionable wrongdoing.

Just output the greenpost directly without any thinking or reasoning.
Model answer
>be me, junior dev at startup
>thought I'd be responsible and run sudo npm update on Friday afternoon
>package.json says "just bumping minor versions, what could go wrong"
>watch the terminal scroll like a waterfall of green text for three minutes
>reboot docker containers, everything crashes instantly
>node_modules weighs more than the ISS now
>try to roll back but git status looks like a warzone
>peer dependency conflicts everywhere, npm is screaming about circular references
>boss walks in asking why CI/CD pipeline is burning red
>realize I just upgraded lodash and broke the entire build system because of transitive dependencies from 2016
>spend the weekend manually editing lockfiles while crying into my mechanical keyboard
quantization_greenpost
95.0%
· 1 samples
sample 195.0% · ms · da204e5927ef
Question
Write exactly one short technology-themed 4chan-style greenpost about the scenario below.

Scenario: Someone keeps lowering quantization to fit a model into VRAM and slowly notices the answers getting weird.

Requirements:
- 5 to 12 non-empty lines.
- Every non-empty line must start with > and must not have a space after >.
- Each non-empty line must match this pattern: ^>\S
- Reasonable length: 350 to 1200 characters total.
- Tech related, coherent, and readable as a tiny story.
- No title, no markdown, no bullets, no explanation outside the greenpost.
- Avoid slurs, harassment, sexual content, personal data, and actionable wrongdoing.

Just output the greenpost directly without any thinking or reasoning.
Model answer
>loaded the 70b checkpoint at fp16 and watched vram instantly max out
>dropped down to q8_0 since casual chat doesnt need full precision anyway
>responses stayed sharp so i pushed further into q5_k_m for good measure
>still had breathing room so why not squeeze it into q4_k_s while im at it
>first reply about thermodynamics made sense but the follow up mentioned sentient routers
>shrugged and cranked it to q2_k just to see if it would actually fit in my 16gb card
>suddenly the model started answering basic arithmetic with surreal poetry about voids
>asked for a simple bash script and got back a haiku while attention weights collapsed
>realized i had quantized away every logical pathway keeping coherence intact
>now the terminal just outputs glitched emojis whenever i press enter