Performance Data
Live benchmark scores and pricing for every LLM and AI tool tracked by AIDiveForge. Data is sourced from Artificial Analysis, Vellum, and community submissions — updated daily.
| Tool | MMLU | HumanEval | GPQA | Context | Input $/1M | Output $/1M |
|---|---|---|---|---|---|---|
| Claude | 91.1% | — | 95.4% | — | — | — |
| Gemini | 91.8% | — | 91.9% | — | — | — |
| Qwen2.5 72B | — | — | — | — | — | — |
| DBRX Instruct | 80.7% | 87.5% | — | — | $0.50 | $1.50 |
| Mistral Large 2 | 86.2% | 89.8% | 48.6% | 128,000 | $2.00 | $6.00 |
| o1 | 92.3% | 94.5% | 96.5% | — | $15.00 | $60.00 |
| Command R7B | — | — | — | — | — | — |