模型跑分

综合网站数据,综合更新于 06.25。整合 LMArena 竞技场、Vals AI SWE-bench Verified06.17)等公开榜单。

#模型竞技场代码修复科学推理数学竞赛综合知识极限考试
1
Anthropic
Claude Fable 5已停用
Anthropic
151595.0%95%97%92%35%
2
Anthropic
Claude Opus 4.8
Anthropic
151288.6%94%97%92%32%
3
OpenAI
GPT-5.5 Pro
OpenAI
150883.5%94%98%91%30%
4
OpenAI
GPT-5.5
OpenAI
150082.6%94%97%90%28%
5
Anthropic
Claude Opus 4.7
Anthropic
150382.0%92%95%90%27%
6
Gemini
Gemini 3.1 Pro Preview
谷歌
150579.1%94%96%91%29%
7
Gemini
Gemini 3.5 Flash
谷歌
147378.8%91%93%89%24%
8
OpenAI
GPT-5.4 (xhigh)
OpenAI
78.0%
9
Anthropic
Claude Opus 4.6 (Thinking)
Anthropic
77.9%
10
OpenAI
GPT-5.3 Codex
OpenAI
77.8%
11
DeepSeek
DeepSeek V4开源
深度求索
141077.6%86%91%87%20%
12
Anthropic
Claude Sonnet 4.6
Anthropic
142577.3%88%92%87%21%
13
Anthropic
Claude Opus 4.5 (Thinking)
Anthropic
76.7%
14
Zhipu
GLM 5.1开源
智谱
136076.6%82%87%86%16%
15
Gemini
Gemini 3 Pro
谷歌
76.6%
16
Kimi
Kimi K2.6开源
月之暗面
134576.1%84%88%85%17%
17
Grok
Grok 4.3
xAI
149676.0%90%94%88%24%
18
OpenAI
GPT-5.2
OpenAI
146475.5%91%95%88%26%
19
Gemini
Gemini 3 Flash
谷歌
75.1%
20
Minimax
MiniMax-M2.1
MiniMax
75.0%
21
Minimax
MiniMax-M2.5
MiniMax
74.3%
22
Qwen
Qwen 3.7 Max
通义千问
148874.2%88%93%89%22%
23
Minimax
MiniMax-M2.7
MiniMax
73.8%
24
Zhipu
GLM 5开源
智谱
73.5%
25
Qwen
Qwen 3.5 Plus
通义千问
72.8%
26
Anthropic
Claude Sonnet 4.5 (Thinking)
Anthropic
72.0%
27
Kimi
Kimi K2.5开源
月之暗面
71.5%
28
OpenAI
GPT-5.1
OpenAI
71.0%
29
OpenAI
GPT-5.4 Mini
OpenAI
70.5%
30
Zhipu
GLM 4.7开源
智谱
69.8%
31
OpenAI
GPT-5
OpenAI
149069.5%90%94%88%25%
32
DeepSeek
DeepSeek V3.2 (Thinking)开源
深度求索
69.0%
33
Anthropic
Claude Haiku 4.5 (Thinking)
Anthropic
68.5%
34
Meta
Llama 4.5 Maverick开源
Meta
137068.5%80%82%84%14%
35
Qwen
Qwen 3.5 Flash
通义千问
68.0%
36
Grok
Grok 4.20 (Reasoning)
xAI
67.5%
37
Gemini
Gemini 2.5 Pro
谷歌
145066.5%86%90%86%22%
38
Grok
Grok 4
xAI
65.5%
39
Mistral
Mistral Large 3
Mistral
135265.0%78%80%82%12%
40
Mistral
Devstral 2开源
Mistral
62.0%