Skip to main content
AIDiveForge AIDiveForge

Performance Data

Live benchmark scores and pricing for every LLM and AI tool tracked by AIDiveForge. Data is sourced from Artificial Analysis, Vellum, and community submissions — updated daily.

Tool MMLU HumanEval GPQA Context Input $/1M Output $/1M
Claude 91.1% 95.4%
Gemini 91.8% 91.9%
Qwen2.5 72B
DBRX Instruct 80.7% 87.5% $0.50 $1.50
Mistral Large 2 86.2% 89.8% 48.6% 128,000 $2.00 $6.00
o1 92.3% 94.5% 96.5% $15.00 $60.00
Command R7B