Performance Data

Live benchmark scores and pricing for every LLM and AI tool tracked by AIDiveForge. Data is sourced from Artificial Analysis, Vellum, and community submissions — updated daily.

Tool	MMLU	HumanEval	GPQA	Context	Input $/1M	Output $/1M
Claude	91.1%	—	95.4%	—	—	—
Gemini	91.8%	—	91.9%	—	—	—
Qwen2.5 72B	—	—	—	—	—	—
DBRX Instruct	80.7%	87.5%	—	—	$0.50	$1.50
Mistral Large 2	86.2%	89.8%	48.6%	128,000	$2.00	$6.00
o1	92.3%	94.5%	96.5%	—	$15.00	$60.00
Command R7B	—	—	—	—	—	—