Outlook Business Desk
Beijing-based AI lab Moonshot unveiled Kimi K2 Thinking on Thursday. The company claims it outperforms GPT-5 and Claude Sonnet 4.5 on Humanity’s Last Exam, BrowseComp, and Seal-0, benchmarks testing reasoning, problem-solving, and online information retrieval.
Moonshot AI says K2 Thinking can plan, reason, execute, and adapt over hundreds of steps, tackling tough academic and analytical challenges. It uses a Mixture-of-Experts architecture, with specialised sub-networks working together to solve complex tasks efficiently.
In the text-based Humanity’s Last Exam benchmark, Kimi K2 Thinking scored 44.9%, outperforming GPT-5’s 41.7% and Claude Sonnet 4.5 Thinking’s 32%, proving its advanced comprehension and reasoning abilities.
Kimi K2 Thinking also topped the BrowseComp benchmark, which tests agentic web-browsing skills. It scored 60.2%, higher than GPT-5’s 54.9% and Claude’s 24.1%, showcasing superior web-search intelligence.
Although Kimi K2 Thinking shows strong reasoning and browsing abilities, it still lags behind GPT-5 in programming benchmarks. On LiveCodeBench V6, it achieved 83.1%, compared to GPT-5’s 87% and Claude’s 64%.
Kimi K2 Thinking didn’t replicate its success in LMArena’s rankings, where Google’s Gemini 2.5 Pro ranked first and GPT-5 stood fourth, reflecting diverse strengths among leading AI models.
Kimi K2 Thinking is accessible for free through its website and app without strict message caps. In contrast, ChatGPT restricts free users and limits messages even in India’s ₹399/year Go plan.