Elon Musk Unveils Grok 3: How It Performs Against OpenAI’s GPT-4o & DeepSeek

Grok 3 and Grok 3 mini outperformed Google Deepmind’s Gemini-2 Pro, DeepSeek-V3, Anthropic’s Claude 3.5 Sonnet and OpenAI’s GPT-4o on various benchmarks

Outlook Start-Up Desk

Published At: 18 February 2025 12:20 pm

Elon Musk’s AI start-up xAI on Tuesday launched its latest chatbot Grok 3, which according to Musk is the “Smartest AI on Earth.” Musk along with his team including Igor Babuschkin hosted a launch demo event to showcase the model’s capabilities on various benchmarks relative to other top existing models like OpenAI’s GPT 4o and DeepSeek’s V3 model.

During the demonstration, executives from xAI revealed how they developed Grok. They claimed that building their own data center was the only way to produce the best AI. They said that they intended to launch Grok 3 as soon as possible, therefore didn't have much time. They therefore understood that they had only four months to construct the data center.

The first 100,000 GPUs were operational in 122 days, according to xAI, which described the process as a "monumental effort."

1 June 2026

Get the latest issue of Outlook Business

Additionally, the start-up stated that the H100 cluster needed to be doubled in size. As a result, they started a second phase, where the capacity was doubled in 92 days. The executives further said, "We’ve used all this computing power to continuously improve the product along the way."

Also Read: Elon Musk Set to Release Grok 3, Calls it ‘Smartest AI on Earth’

Grok 3 Benchmark Performance

Benchmark Performance

As per the data displayed in the launch event, xAI’s large language model (LLM) Grok 3 and Grok 3 mini outperformed Google Deepmind’s Gemini-2 Pro, DeepSeek-V3, Anthropic’s Claude 3.5 Sonnet and OpenAI’s GPT-4o on various benchmarks.

On benchmarks like Math (AIME’24), Science (GPQA), Coding (LCB Oct-Feb) Grok 3 scored 52, 75 and 57 respectively. Grok 3 mini scored 40, 65 and 41 respectively. Scores of both of these models are higher than the Gemini, DeepSeek, Claude and GPT.

Also Read: Elon Musk’s Family Visit with Modi Goes Viral: Know About His Wives & Kids

Based on displayed data, the early version of Grok 3 outperformed major models Gemini-2.0 flash thinking, GPT_4o latest, DeepSeek R-1, O1 Preview and many more on Chatbot Arena (LMSYS).

Chatbot Arena is a free, open-source platform that lets users compare and evaluate large language models (LLMs). It was developed by the Large Model Systems Organization (LMSYS Org).

On Reasoning + Test Time Compute criteria, Grok 3 Reasoning Beta and Grok 3 mini Reasoning scored more than o3 mini (high), o1, DeepSeek R-1 and Gemini-2 Flash Thinking.