模型跑分
综合网站数据,综合更新于 06.25。整合 LMArena 竞技场、Vals AI SWE-bench Verified(06.17)等公开榜单。
| # | 模型 | 竞技场 | 代码修复 | 科学推理 | 数学竞赛 | 综合知识 | 极限考试 |
|---|---|---|---|---|---|---|---|
| 1 | Claude Fable 5已停用 Anthropic | 1515 | 95.0% | 95% | 97% | 92% | 35% |
| 2 | Claude Opus 4.8 Anthropic | 1512 | 88.6% | 94% | 97% | 92% | 32% |
| 3 | GPT-5.5 Pro OpenAI | 1508 | 83.5% | 94% | 98% | 91% | 30% |
| 4 | GPT-5.5 OpenAI | 1500 | 82.6% | 94% | 97% | 90% | 28% |
| 5 | Claude Opus 4.7 Anthropic | 1503 | 82.0% | 92% | 95% | 90% | 27% |
| 6 | Gemini 3.1 Pro Preview 谷歌 | 1505 | 79.1% | 94% | 96% | 91% | 29% |
| 7 | Gemini 3.5 Flash 谷歌 | 1473 | 78.8% | 91% | 93% | 89% | 24% |
| 8 | GPT-5.4 (xhigh) OpenAI | — | 78.0% | — | — | — | — |
| 9 | Claude Opus 4.6 (Thinking) Anthropic | — | 77.9% | — | — | — | — |
| 10 | GPT-5.3 Codex OpenAI | — | 77.8% | — | — | — | — |
| 11 | DeepSeek V4开源 深度求索 | 1410 | 77.6% | 86% | 91% | 87% | 20% |
| 12 | Claude Sonnet 4.6 Anthropic | 1425 | 77.3% | 88% | 92% | 87% | 21% |
| 13 | Claude Opus 4.5 (Thinking) Anthropic | — | 76.7% | — | — | — | — |
| 14 | GLM 5.1开源 智谱 | 1360 | 76.6% | 82% | 87% | 86% | 16% |
| 15 | Gemini 3 Pro 谷歌 | — | 76.6% | — | — | — | — |
| 16 | Kimi K2.6开源 月之暗面 | 1345 | 76.1% | 84% | 88% | 85% | 17% |
| 17 | Grok 4.3 xAI | 1496 | 76.0% | 90% | 94% | 88% | 24% |
| 18 | GPT-5.2 OpenAI | 1464 | 75.5% | 91% | 95% | 88% | 26% |
| 19 | Gemini 3 Flash 谷歌 | — | 75.1% | — | — | — | — |
| 20 | MiniMax-M2.1 MiniMax | — | 75.0% | — | — | — | — |
| 21 | MiniMax-M2.5 MiniMax | — | 74.3% | — | — | — | — |
| 22 | Qwen 3.7 Max 通义千问 | 1488 | 74.2% | 88% | 93% | 89% | 22% |
| 23 | MiniMax-M2.7 MiniMax | — | 73.8% | — | — | — | — |
| 24 | GLM 5开源 智谱 | — | 73.5% | — | — | — | — |
| 25 | Qwen 3.5 Plus 通义千问 | — | 72.8% | — | — | — | — |
| 26 | Claude Sonnet 4.5 (Thinking) Anthropic | — | 72.0% | — | — | — | — |
| 27 | Kimi K2.5开源 月之暗面 | — | 71.5% | — | — | — | — |
| 28 | GPT-5.1 OpenAI | — | 71.0% | — | — | — | — |
| 29 | GPT-5.4 Mini OpenAI | — | 70.5% | — | — | — | — |
| 30 | GLM 4.7开源 智谱 | — | 69.8% | — | — | — | — |
| 31 | GPT-5 OpenAI | 1490 | 69.5% | 90% | 94% | 88% | 25% |
| 32 | DeepSeek V3.2 (Thinking)开源 深度求索 | — | 69.0% | — | — | — | — |
| 33 | Claude Haiku 4.5 (Thinking) Anthropic | — | 68.5% | — | — | — | — |
| 34 | Llama 4.5 Maverick开源 Meta | 1370 | 68.5% | 80% | 82% | 84% | 14% |
| 35 | Qwen 3.5 Flash 通义千问 | — | 68.0% | — | — | — | — |
| 36 | Grok 4.20 (Reasoning) xAI | — | 67.5% | — | — | — | — |
| 37 | Gemini 2.5 Pro 谷歌 | 1450 | 66.5% | 86% | 90% | 86% | 22% |
| 38 | Grok 4 xAI | — | 65.5% | — | — | — | — |
| 39 | Mistral Large 3 Mistral | 1352 | 65.0% | 78% | 80% | 82% | 12% |
| 40 | Devstral 2开源 Mistral | — | 62.0% | — | — | — | — |