Build an eval suite

Create a benchmark card, attach inline or bucket-backed datasets, and submit it for approval.

Name *Slug *

Category *Runner *Scoring *

DescriptionSource URL

Task 1

Task key *Display name *Weight

Task typeDataset source

Prompt template

Sample 1

Input *Choices, one per line *Gold answer *