ModelsLeaderboardHardwareEvalsTrainRentalsAPI Docs
Language

Local Reasoning Mini

Official

A lightweight 10-question sanity check for locally served models. Designed for the trusted /api/evals/execute path.

Category: reasoningEval type: Custom server-sideVersion: v1.0Submitted by: Lottolabs
Custom eval: run it locally with the LocalMaxxing CLI and upload artifacts, or use POST /api/evals/execute if your OpenAI-compatible endpoint is publicly reachable.

Eval Details

Scoring
Exact Match
Aggregation
Mean
Direction
Higher is better
Tasks
2 tasks

Default Run Config

TopP: 1Temperature: 0
TaskDatasetWeightShotsMax Tokens
Basic Math
basic_math
5 inline items0.5Default16
Basic Logic
basic_logic
5 inline items0.5Default8

Leaderboard— best run per model

#ModelScoreQuantHardware
Qwen3.6-27B
Qwen
100.0%
IQ4_NLNVIDIA GeForce RTX 3090
2Darwin-36B-Opus
Qwen
80.0%
Q8_0RTX 5060 Ti

Task Breakdown— top model

basic_logic
100.0%
· 0 samples
basic_math
100.0%
· 0 samples