X AI's Grok-3 Shows Dominance Over Competitors in Benchmark Tests

更新时间：2025-02-18 18:54 来源：Manufactry

X AI today released its new-generation large language model, Grok-3, and its streamlined version, Grok-3 mini. The latest benchmark tests show that Grok-3 has a significant advantage in a direct comparison with DeepSeek.

In the mathematical ability test (AIME '24), Grok-3 scored 52 points, significantly exceeding DeepSeek-V3's 39 points. In the scientific knowledge assessment (GPQA), Grok-3 led with a score of 75 points, while DeepSeek-V3 scored 65 points. In the programming ability test (LCB Oct - Feb), Grok-3 also scored 57 points, surpassing DeepSeek-V3's 36 points.

In the newly announced AIME 2025 performance test, the Grok-3 Reasoning Beta version achieved an excellent score of 93 points in the composite score of reasoning and calculation time, and its streamlined version, Grok-3 mini, also reached 90 points. In contrast, DeepSeek-R1 scored 75 points, and Gemini-2 Flash Thinking only scored 54 points. This result further highlights Grok-3's outstanding advantages in complex mathematical reasoning and computational efficiency.

Notably, DeepSeek's recently released DeepSeek-R1 also failed to outperform Grok-3 in other reasoning ability tests. In mathematical reasoning, Grok-3 scored 93 points, while DeepSeek-R1 scored 73 points; in scientific reasoning, Grok-3 scored 85 points, and DeepSeek-R1 scored 74 points; in programming reasoning, Grok-3 reached 79 points, while DeepSeek-R1 scored 65 points.

In addition, in the LMSYS chatbot arena assessment, Grok-3 scored approximately 1400 points, not only exceeding the DeepSeek series but also leading other mainstream large models, including GPT-4 and Claude. These data indicate that despite DeepSeek's strong development momentum in the past few months, Grok-3 still maintains a leading position in overall performance. Its advantages are particularly obvious in mathematical reasoning and computational efficiency, which not only reflects xAI's technical strength in model development but also shows the intense competition in the AI field.

【相关新闻】

泉州市直机关党群系列绩效管理领导小组成员单位联席会议召开

福建省知识产权保护中心：专利预审“一企一策一组合” 开辟创新成果保护新路径

数字赋能新治理闽地鼓楼展新篇

福建省首个乡镇献血屋正式启用

福州市晋安区：“三个融合”激活党员电教片制作新动能

福建省美丽乡村设计创意大赛在晋江举行

卖二手房无需先还贷？福建晋江实现二手房 “带押过户”常态化办理

晋江市政务服务中心：强化技能培训，推进政务服务“一网好办”

迎战台风“格美” 福建电信人全力保障广大用户通信畅通

福建：中国电信多措并举全面防御备战超强台风“格美”

福建省农业农村厅赴宁夏调研闽宁协作暨乡村振兴工作

“秀绝活”！拼技艺铸匠心——晋江市举办首届焊工职业安全技能竞赛