用於評估本地LLM品質的社群基準測試套件。透過API提交結果。
A lightweight 10-question sanity check for locally served models. Designed for the trusted /api/evals/execute path.