LocalMaxxing
Models
Hardware
Evals
Leaderboard
Pro
More
+
Submit
Models
Get started
Leaderboard
Hardware
Marketplace
Evals
Train
Rentals
Pro
API Docs
Language
English
简体中文
繁體中文
日本語
한국어
Español
Français
Deutsch
Italiano
Português (Brasil)
Русский
Polski
Nederlands
Türkçe
हिन्दी
Bahasa Indonesia
Tiếng Việt
ไทย
Back to evals
Manual
AI draft
Build an eval suite
Create a benchmark card, attach inline or bucket-backed datasets, and submit it for approval.
Name *
Slug *
Category *
Runner *
Custom local
LM-Eval Harness
Scoring *
Exact match
F1
Pass@k
LLM judge
User rating
Description
Source URL
Task 1
Task key *
Display name *
Weight
Task type
Multiple choice
QA
Code
Judge
Dataset source
Inline samples
S3 bucket upload
URL
Hugging Face
Prompt template
{{input}}
Sample 1
Input *
Choices, one per line *
Gold answer *
Add sample
Add another task