LocalMaxxing
模型
模型
評測
排行榜
更多
API
+
提交
模型
排行榜
模型
Marketplace
評測
訓練
租用
API文件
Language
English
简体中文
繁體中文
日本語
한국어
Your Ad Here
Back to evals
Build an eval suite
Create a benchmark card, attach inline or bucket-backed datasets, and submit it for approval.
Name *
Slug *
Category *
Runner *
Custom local
LM-Eval Harness
Scoring *
Exact match
F1
Pass@k
LLM judge
Description
Source URL
Task 1
Task key *
Display name *
Weight
Task type
Multiple choice
QA
Code
Judge
Dataset source
Inline samples
S3 bucket upload
URL
Hugging Face
Prompt template
{{input}}
Sample 1
Input *
Choices, one per line *
Gold answer *
Add sample
Add another task