X AI's Grok-3 Shows Dominance Over Competitors in Benchmark Tests

来源:Manufactry | 2025-02-18 18:54

X AI today released its new-generation large language model, Grok-3, and its streamlined version, Grok-3 mini. The latest benchmark tests show that Grok-3 has a significant advantage in a direct comparison with DeepSeek.

In the mathematical ability test (AIME '24), Grok-3 scored 52 points, significantly exceeding DeepSeek-V3's 39 points. In the scientific knowledge assessment (GPQA), Grok-3 led with a score of 75 points, while DeepSeek-V3 scored 65 points. In the programming ability test (LCB Oct - Feb), Grok-3 also scored 57 points, surpassing DeepSeek-V3's 36 points.

In the newly announced AIME 2025 performance test, the Grok-3 Reasoning Beta version achieved an excellent score of 93 points in the composite score of reasoning and calculation time, and its streamlined version, Grok-3 mini, also reached 90 points. In contrast, DeepSeek-R1 scored 75 points, and Gemini-2 Flash Thinking only scored 54 points. This result further highlights Grok-3's outstanding advantages in complex mathematical reasoning and computational efficiency.

Notably, DeepSeek's recently released DeepSeek-R1 also failed to outperform Grok-3 in other reasoning ability tests. In mathematical reasoning, Grok-3 scored 93 points, while DeepSeek-R1 scored 73 points; in scientific reasoning, Grok-3 scored 85 points, and DeepSeek-R1 scored 74 points; in programming reasoning, Grok-3 reached 79 points, while DeepSeek-R1 scored 65 points.

In addition, in the LMSYS chatbot arena assessment, Grok-3 scored approximately 1400 points, not only exceeding the DeepSeek series but also leading other mainstream large models, including GPT-4 and Claude. These data indicate that despite DeepSeek's strong development momentum in the past few months, Grok-3 still maintains a leading position in overall performance. Its advantages are particularly obvious in mathematical reasoning and computational efficiency, which not only reflects xAI's technical strength in model development but also shows the intense competition in the AI field.

推荐阅读
卖二手房无需先还贷?福建晋江实现二手房 “带押过户”常态化办理
数字赋能新治理 闽地鼓楼展新篇
福建省首个乡镇献血屋正式启用
福州市晋安区:“三个融合”激活党员电教片制作新动能
福建省美丽乡村设计创意大赛在晋江举行