Anthropic Unveils Claude Opus 4.5, to Challenge ChatGPT & Gemini — All You Need to Know

Outlook Business Desk

New AI Launch

Anthropic has released Claude Opus 4.5, presenting it as its strongest version yet and claiming it leads the field in coding performance, agent behaviour, and tasks involving computer-based operations.

Benchmark Leader

Opus 4.5 secures an 80.9% score on SWE-bench Verified, becoming the first model to cross the 80% line and setting a new industry performance milestone.

Rivals Compared

Claude Opus 4.5 scores above Gemini 3 Pro’s 76.2% and GPT-5.1 Codex Max’s 77.9% on SWE-bench Verified, showing a clear performance edge over recently introduced competitors.

Human Test Edge

Anthropic says Opus 4.5 surpasses all human applicants on its two-hour engineering assessment, a test designed to gauge technical judgement and pressure-based problem-solving, though it does not measure broader collaborative strengths.

Agentic Capability Gains

On the τ2-bench, which evaluates multi-turn real-world tasks, Claude Opus 4.5 performs above rival models, displaying stronger reasoning depth and more consistent step-by-step execution in practical scenarios.

Smart Problem Solving

During an airline-service test scenario, Opus 4.5 managed a non-modifiable booking by upgrading the cabin first and then adjusting the flights, offering a valid solution that satisfied the benchmark’s requirements.

Safety Upgrades Added

Anthropic presents Opus 4.5 as its safest model so far, noting improved resistance to prompt-injection attempts that aim to push the system towards unintended actions or misleading instructions.

freepik

Available to Use

Claude Opus 4.5 can now be used through the Claude app on Android and iOS, as well as the website, with developers gaining immediate access for integration purposes.

FreePik

Meet OpenAI's GPT-5.1 Codex-Max: The New Agentic Coding Model That Can Work for Long Hours

Read More