Local Reasoning Mini

Official

A lightweight 10-question sanity check for locally served models. Designed for the trusted /api/evals/execute path.

Category: reasoningEval type: Custom server-sideVersion: v1.0Submitted by: Lottolabs

Custom eval: run it locally with the LocalMaxxing CLI and upload artifacts, or use POST /api/evals/execute if your OpenAI-compatible endpoint is publicly reachable.

Eval Details

Scoring

Exact Match

Aggregation

Mean

Direction

Higher is better

Tasks

2 tasks

Default Run Config

TopP: 1Temperature: 0

Task	Dataset	Weight	Shots	Max Tokens
Basic Math basic_math	5 inline items	0.5	Default	16
Basic Logic basic_logic	5 inline items	0.5	Default	8

Leaderboard— best run per model

#	Model	Score	Quant	Hardware
	Qwen3.6-27B Qwen	100.0%	IQ4_NL	NVIDIA GeForce RTX 3090
2	Darwin-36B-Opus Qwen	80.0%	Q8_0	RTX 5060 Ti

Task Breakdown— top model

basic_logic

100.0%

— · 0 samples

basic_math

100.0%

— · 0 samples