Methodology
What This Benchmark Measures
Leaderboard
Model Scores
Consistency
Attempt Distribution by Model
Each row shows the five attempt scores for a model on the same horizontal scale. The darker marker is the median.
Attempts
Attempt Table
Build Transcript
How Each Bot Was Built
Opponent Breakdown
Score Against Each Benchmark Opponent
Why The Scores Differ
What The Better Bots Are Doing
Hand Replay