News

Elon Musk Unveils Grok 3: How It Performs Against OpenAI’s GPT-4o & DeepSeek

Grok 3 and Grok 3 mini outperformed Google Deepmind’s Gemini-2 Pro, DeepSeek-V3, Anthropic’s Claude 3.5 Sonnet and OpenAI’s GPT-4o on various benchmarks

Elon Musk Unveils Grok 3: How It Performs Against OpenAI’s GPT-4o & DeepSeek
info_icon

Elon Musk’s AI start-up xAI on Tuesday launched its latest chatbot Grok 3, which according to Musk is the “Smartest AI on Earth.” Musk along with his team including Igor Babuschkin hosted a launch demo event to showcase the model’s capabilities on various benchmarks relative to other top existing models like OpenAI’s GPT 4o and DeepSeek’s V3 model.

During the demonstration, executives from xAI revealed how they developed Grok. They claimed that building their own data center was the only way to produce the best AI. They said that they intended to launch Grok 3 as soon as possible, therefore didn't have much time. They therefore understood that they had only four months to construct the data center.

The first 100,000 GPUs were operational in 122 days, according to xAI, which described the process as a "monumental effort."

Additionally, the start-up stated that the H100 cluster needed to be doubled in size. As a result, they started a second phase, where the capacity was doubled in 92 days. The executives further said, "We’ve used all this computing power to continuously improve the product along the way."

Grok 3 Benchmark Performance
Grok 3 Benchmark Performance
Grok 3 Benchmark Performance
info_icon

Benchmark Performance

As per the data displayed in the launch event, xAI’s large language model (LLM) Grok 3 and Grok 3 mini outperformed Google Deepmind’s Gemini-2 Pro, DeepSeek-V3, Anthropic’s Claude 3.5 Sonnet and OpenAI’s GPT-4o on various benchmarks.

On benchmarks like Math (AIME’24), Science (GPQA), Coding (LCB Oct-Feb) Grok 3 scored 52, 75 and 57 respectively. Grok 3 mini scored 40, 65 and 41 respectively. Scores of both of these models are higher than the Gemini, DeepSeek, Claude and GPT.

Based on displayed data, the early version of Grok 3 outperformed major models Gemini-2.0 flash thinking, GPT_4o latest, DeepSeek R-1, O1 Preview and many more on Chatbot Arena (LMSYS).

Chatbot Arena is a free, open-source platform that lets users compare and evaluate large language models (LLMs). It was developed by the Large Model Systems Organization (LMSYS Org).

On Reasoning + Test Time Compute criteria, Grok 3 Reasoning Beta and Grok 3 mini Reasoning scored more than o3 mini (high), o1, DeepSeek R-1 and Gemini-2 Flash Thinking.

Published At:
SUBSCRIBE
Tags

Click/Scan to Subscribe

qr-code
×