Advertisement
X

Microsoft Unveils Phi-4-Reasoning Small Language Models to Rival OpenAI’s o3-Mini

Phi-4-reasoning is a 14-billion-parameter open-weight reasoning model that rivals much larger models on complex reasoning tasks. Trained through supervised fine-tuning on carefully selected reasoning demonstrations from OpenAI’s o3-mini, it constructs detailed reasoning chains, effectively leveraging additional inference-time computation

Microsoft Unveils Phi-4-Reasoning Small Language Models to Rival OpenAI’s o3-Mini

Microsoft on Wednesday unveiled three small language models: Phi-4-reasoning, Phi-4-reasoning-plus and Phi-4-mini-reasoning. The most capable of these models demonstrates performance comparable to OpenAI’s o3-mini on at least one benchmark.

Advertisement

“We are excited to introduce Phi-4-reasoning, Phi-4-reasoning-plus and Phi-4-mini-reasoning—marking a new era for small language models and once again redefining what is possible with small and efficient AI,” Microsoft stated in a blog post on its website.

As their names suggest, the new permissively licensed models—Phi-4-mini-reasoning, Phi-4-reasoning and Phi-4-reasoning-plus—are reasoning models designed to spend more time verifying answers to complex queries. They expand Microsoft’s Phi small-model family, introduced a year ago to provide a framework for AI developers creating edge applications.

Phi-4-reasoning and Phi-4-reasoning-plus

Phi-4-reasoning is a 14-billion-parameter open-weight reasoning model that rivals much larger models on complex reasoning tasks. Trained through supervised fine-tuning on carefully selected reasoning demonstrations from OpenAI’s o3-mini, it constructs detailed reasoning chains, effectively leveraging additional inference-time computation.

The model demonstrates that meticulous data curation and high-quality synthetic datasets enable smaller models to compete with larger ones.

Advertisement

Phi-4-reasoning-plus enhances Phi-4-reasoning’s capabilities by incorporating reinforcement learning to utilise more inference-time computation—using 1.5 times more tokens than Phi-4-reasoning—to achieve greater accuracy.

Despite their significantly smaller size, both Phi-4-reasoning and Phi-4-reasoning-plus surpass OpenAI’s o1-mini and DeepSeek-R1-Distill-Llama-70B on most benchmarks, including mathematical reasoning and PhD-level science problems.

They also outperform the entire DeepSeek-R1 model (with 671 billion parameters) on the AIME 2025 test, the 2025 qualifier for the US Math Olympiad. Both models are accessible through Azure AI Foundry and HuggingFace.

Phi-4-mini-reasoning

Phi-4-mini-reasoning is designed to meet the demand for a compact reasoning model. This transformer-based language model excels in mathematical reasoning, providing high-quality, step-by-step problem-solving in scenarios with limited computational resources or low latency.

Fine-tuned using synthetic data generated by the DeepSeek-R1 model, Phi-4-mini-reasoning balances efficiency with advanced reasoning capabilities.

It is ideal for educational applications, embedded tutoring and lightweight deployment on edge or mobile systems, having been trained on over 1 million diverse math problems ranging from middle-school to PhD level. The model is available for testing on Azure AI Foundry or HuggingFace.

Advertisement
Show comments