OpenAI is not interested in letting the narrative settle. On Thursday, the company released its newest AI model, GPT-5.5, to paid subscribers, just six weeks after debuting GPT-5.4, a turnaround pace that would have seemed reckless even by the standards of the current AI arms race. The release lands at a charged moment: Anthropic has been quietly eating into OpenAI’s enterprise customer base, and the social media conversation had been tilting toward a story that the ChatGPT maker was losing its edge.
GPT-5.5, internally codenamed “Spud,” is OpenAI’s answer to that story.
OpenAI co-founder and president Greg Brockman described it as “a new class of intelligence,” calling it a significant step toward more agentic and autonomous computing. More specifically, OpenAI says the model is better at coding, using computers, and pursuing deeper research capabilities, with particular improvements in its ability to complete multi-step workflows with less user guidance. The key phrase there is “less guidance.” The industry term is agentic AI, and it is increasingly where the real enterprise money is being competed over.
OpenAI’s chief research officer Mark Chen said GPT-5.5 is better at navigating computer work than its predecessors and shows meaningful gains on scientific and technical research workflows, noting that the company believes the model could genuinely help expert scientists make progress, including in areas like drug discovery.
The benchmark numbers tell a nuanced story, which is worth reading carefully rather than taking OpenAI’s framing at face value. On Terminal-Bench 2.0, a test of a model’s ability to complete tasks in a sandboxed terminal environment, GPT-5.5 achieved 82.7% accuracy, narrowly ahead of Anthropic’s Claude Mythos Preview at 82.0% and well above Claude Opus 4.7 at 69.4%. That headline figure is what OpenAI’s communications machine will be amplifying, and for the coding and agentic use cases that matter most to enterprise buyers, it is a meaningful result.
But the picture looks different in other areas. On Humanity’s Last Exam without tools, a measure of raw multidisciplinary reasoning, GPT-5.5 Pro scored 43.1%, trailing Opus 4.7 at 46.9% and Mythos Preview at a considerable 56.8%. That gap in abstract reasoning is not a rounding error. It suggests that while OpenAI is winning on applied, task-execution intelligence, Anthropic’s models retain a meaningful lead where the task is harder, more knowledge-intensive reasoning without the scaffolding of tools.
Across the fuller benchmark comparison, Mythos leads on SWE-bench Pro, Humanity’s Last Exam both with and without tools, CyberGym, and OSWorld-Verified, while GPT-5.5 and Mythos are effectively tied on Terminal-Bench 2.0 and GPQA Diamond. The honest read, then, is that GPT-5.5 is genuinely the best widely available model for agentic and coding tasks, while Anthropic’s Mythos remains ahead in several capability dimensions. The catch is that most enterprises will never touch Mythos.
Anthropic released Claude Mythos Preview earlier this month but limited its rollout due to the model’s advanced cybersecurity capabilities, specifically its ability to identify weaknesses and security flaws within software. That restricted availability fundamentally shapes how any head-to-head comparison should be read. Mythos Preview is not commercially available and is restricted to select partners due to cybersecurity concerns, which leaves the real market contest among publicly accessible models. In that contest, OpenAI has reasserted itself clearly.
The rollout of GPT-5.5 is itself telling about where OpenAI’s priorities lie. The model is now available to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex, though GPT-5.5 and its Pro variant are not launching to the API immediately, as OpenAI says API deployments require different safeguards and the company is working with partners on safety and security requirements for serving it at scale. That caveat matters for developers and enterprise integrations built on the API, where Anthropic currently has a distribution advantage. Claude Opus 4.7 has been generally available since April 16 across the Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry, giving procurement teams with existing cloud commitments a more immediately deployable option.
The super app angle is also worth watching. Brockman said GPT-5.5 represents an additional step toward creating a unified super app that would combine ChatGPT, Codex, and an AI browser into a single service for enterprise customers. It is an ambitious vision that would essentially make OpenAI a one-stop operating environment for knowledge workers, rather than a model provider that others build on top of. The strategy has competitive logic: locking enterprise customers into a unified OpenAI surface reduces the surface area were Anthropic, Google, or others can compete.
The usage numbers OpenAI disclosed alongside the launch are doing some heavy lifting in the PR narrative. The company said there are 4 million active Codex users and 9 million paying business users on ChatGPT, with the platform exceeding 900 million weekly active users and over 50 million subscribers. Those figures are genuinely enormous and serve as a counterweight to the enterprise narrative that Anthropic has been winning on the quality and trust dimensions that matter to large organizations. Scale and quality are different arguments, and OpenAI is making both simultaneously.
On pricing, OpenAI confirmed that GPT-5.5 is priced higher than GPT-5.4 but claims it is more token-efficient, noting that in Codex the model delivers better results with fewer tokens than its predecessor for most users. Token efficiency is not a glamorous selling point, but for enterprises running millions of agentic workflows, the cost calculus is real and it may be enough to justify the step up in pricing.
The pace of all this is worth stepping back to register. Six weeks between major model releases. A fully retrained base model, not an incremental fine-tune. Cybersecurity red teaming baked into the release process as a first-class concern. OpenAI’s VP of Research Mia Glaese noted that GPT-5.5 underwent extensive third-party safeguard testing and red teaming for cyber and biological risks, with iterative work on cyber safeguards conducted across months of increasingly capable models. That framing is partly a response to the attention Anthropic’s Mythos generated around AI and cybersecurity risks, and it is part of a broader industry shift toward safety as a competitive differentiator rather than just a regulatory obligation.
What OpenAI has done with GPT-5.5 is stabilize its position as the most capable model most people can actually access. It has not definitively answered the harder question of whether it is the most capable model that exists. For the majority of enterprise buyers making decisions today, that distinction may not matter much. For those tracking where the frontier is actually moving, it matters quite a bit.

