Community benchmark suites for evaluating local LLM quality. Submit results via the API.
A lightweight 10-question sanity check for locally served models. Designed for the trusted /api/evals/execute path.