OpenAI Retires GPT-4.5 and o3: Ushering in the GPT-5 Era

OpenAI is retiring GPT-4.5 from ChatGPT on June 27 and o3 on August 26, ending access to two paid-user models after 30-day and 90-day sunset periods, respectively. The changes, announced quietly in ChatGPT’s model-release notes on May 28, apply only to ChatGPT: OpenAI says there are no API changes at this time.[1]

The decision further concentrates the consumer product around the GPT-5 generation, which OpenAI introduced in August 2025 as a unified system that can route between fast answers and deeper reasoning. It also closes a distinct chapter in OpenAI’s product strategy: GPT-4.5 was an expensive, conventional large language model built through scaled pretraining, while o3 was a reasoning model designed to spend more time and computation working through difficult tasks. Their retirement does not mean GPT-5 exactly reproduces every behavior users valued in either model.

A ChatGPT consolidation, not an API shutdown

GPT-4.5 and o3 were available in ChatGPT through model settings for paid subscribers. After their respective sunset dates, they will no longer be selectable in the chat product. OpenAI described the move as an effort to reduce use of older models and focus ChatGPT on newer ones.[1]

That scope matters for developers and businesses. OpenAI explicitly stated that the announcement involves no API changes. As of May 30, the company has not announced the removal of GPT-4.5 or o3 API access, the shutdown of specific model snapshots, or a date for API deprecation.[1]

The move does remove the last GPT-4-branded option still selectable in ChatGPT, but describing it simply as the end of “GPT-4 models” would obscure an important technical distinction. o3 is not a GPT-4 model; it belongs to OpenAI’s separate o-series family of reasoning models. The release-note update instead retires one late-generation GPT model and one prominent o-series model as ChatGPT moves to a GPT-5-centered lineup.

data center server racks — Photo: Joël van der Loo, CC BY-SA 4.0, via Wikimedia Commons

Data: OpenAI internal visual-reasoning evaluation, as cited in the article

Why GPT-4.5 had a limited role

OpenAI launched GPT-4.5 as a research preview in February 2025. It was the company’s largest and most compute-intensive model at the time, though OpenAI did not disclose its parameter count. Rather than relying on the extended test-time reasoning used by o-series systems, GPT-4.5 chiefly pursued gains through more conventional scaling of pretraining: data, compute, architecture and optimization.[2]

OpenAI positioned the model as broadly capable, with stronger world knowledge, pattern recognition, creativity and conversational quality. It supported search, file and image uploads, vision inputs, function calling, structured outputs and streaming. But it did not deliberate before answering in the way o1 and o3 did, and it initially lacked some ChatGPT features, including voice, video and screen sharing.[2]

Its benchmark profile showed the tradeoff. GPT-4.5 improved on GPT-4o across general knowledge and multimodal tests, including 71.4% on GPQA science, 85.1% on MMMLU and 74.4% on MMMU. But it was less competitive on tasks where explicit reasoning mattered: it scored 36.7% on AIME 2024 mathematics, compared with 87.3% for o3-mini-high, and 38.0% on SWE-Bench Verified, compared with 61.0% for o3-mini-high.[2]

Economics likely limited its long-term place in the lineup. OpenAI’s API documentation priced GPT-4.5 at $75 per million input tokens and $150 per million output tokens, substantially above GPT-4o’s then-current pricing. OpenAI had cautioned at launch that GPT-4.5 was not a replacement for GPT-4o.[2] Independent coding evaluations also questioned whether its incremental gains warranted that cost; Ars Technica reported that developer Paul Gauthier’s Aider Polyglot benchmark placed it tenth, behind several reasoning-oriented competitors.[7]

o3’s reasoning promise and its practical constraints

o3 represented the opposite design choice. Released with o4-mini in April 2025, it was trained to “think for longer” before responding. Adjustable reasoning effort allowed users to trade speed and cost for more test-time computation, while the production model added visual-input capabilities and product-oriented chat improvements.[3]

The model drew early attention from its December 2024 preview, which posted 76% on ARC-AGI-1 at low compute and 88% at high compute. ARC-AGI is intended to test abstract reasoning and generalization on unfamiliar visual puzzles, making the result especially prominent in debates around frontier-model progress.[4]

But the headline results came from a preview system operating with unusually large test-time resources. Subsequent testing by the ARC Prize Foundation of production o3 produced lower scores under more practical configurations: 41% on ARC-AGI-1 at low reasoning effort and 53% at medium effort, while scores on the more difficult ARC-AGI-2 were below 3% at both settings.[4]

High-effort runs also revealed operational limits. ARC Prize said o3-high returned answers for only 37 of 100 ARC-AGI-1 tasks and 15 of 120 ARC-AGI-2 tasks in its partial sample, with many runs failing to finish within testing limits. The findings do not negate o3’s capabilities, but they illustrate why a model’s best-case benchmark number can differ sharply from its performance, latency and cost in a deployed product.[4]

laptop computer workspace — Photo: Aleksi Tappura a, CC0, via Wikimedia Commons

GPT-5 becomes the default direction

OpenAI launched GPT-5 in August 2025 as a successor platform for both ChatGPT and the API. Its central product premise was simplification: a unified system combining rapid responses with deeper reasoning, rather than requiring users to manually navigate a growing set of separate model families.[5]

OpenAI says GPT-5 with reasoning outperformed o3 while using 50% to 80% fewer output tokens across visual reasoning, agentic coding and graduate-level science tasks. The company also reported that GPT-5 responses were about 45% less likely to contain factual errors than GPT-4o, and that GPT-5 with reasoning was about 80% less likely than o3 to make factual errors on anonymized, web-enabled production-style prompts.[5]

Those are OpenAI’s internal evaluations, not independent measurements, and they should be read accordingly. OpenAI additionally reported roughly six times fewer hallucinations than o3 on LongFact and FActScore tests. In one visual-reasoning evaluation where images had been removed from prompts, it said o3 confidently answered questions about nonexistent images 86.7% of the time, versus 9% for GPT-5.[5]

The claims help explain the consolidation: GPT-5 is intended to cover both the general-purpose role of models such as GPT-4.5 and the deeper reasoning role associated with o3, with less user-facing complexity. But OpenAI has not said GPT-5 is technically identical to either predecessor or that it preserves every capability, output style or workflow behavior they offered.

Users may notice more than benchmark changes

The retirement has prompted pushback from some users who prefer GPT-4.5’s prose and conversational style or o3’s more focused reasoning behavior. TechRadar reported user comments arguing that GPT-5-series models had not fully replaced those characteristics.[8] Such reactions reflect a recurring complication in AI product consolidation: benchmark performance is only one part of model utility. Writers, developers and other frequent users also care about tone, creativity, consistency, latency and the behavior they have built into existing workflows.

Outside assessments of the GPT-5 transition have similarly been measured. Cornell computer-science professor John Thickstun described GPT-5’s gains on existing benchmarks as “modest but significant,” while arguing that the system differed enough from GPT-4 to reset OpenAI’s flagship technology and create room for further advances.[6]

For OpenAI, retiring GPT-4.5 and o3 is therefore best understood as product rationalization rather than a declaration that either model was unsuccessful. GPT-4.5 exposed the cost limits of simply making a general-purpose model larger. o3 demonstrated both the power and expense of extended reasoning. By removing both from ChatGPT, OpenAI is betting that a unified GPT-5-era experience is easier to use, cheaper to operate and strong enough for most of the work those specialized choices once served.

Editor’s Take

I see this less as OpenAI declaring two models obsolete than admitting that a sprawling model picker is a poor product interface. GPT-4.5 represented costly, broad pretraining; o3 represented costly test-time reasoning. A unified GPT-5 route is commercially sensible if it reliably chooses the right tradeoff without making users guess which acronym fits a task.

The caveat is that model consolidation can erase useful workflow characteristics even when aggregate benchmark scores rise. Teams that depend on GPT-4.5’s writing style or o3’s deliberate problem-solving should capture representative prompts and evaluate GPT-5 against their own acceptance criteria now, rather than treating a vendor benchmark as a migration plan. The key thing to watch is whether OpenAI preserves API availability and gives developers stable, controllable reasoning and latency settings; that matters more in production than a cleaner consumer model menu.

References

OpenAI, ChatGPT Model Release Notes – https://help.openai.com/en/articles/9624314-model-release-notes
OpenAI, Introducing GPT-4.5 – https://openai.com/index/introducing-gpt-4-5/
OpenAI, Introducing o3 and o4-mini – https://openai.com/index/introducing-o3-and-o4-mini/
ARC Prize Foundation, Analyzing o3 with ARC-AGI – https://arcprize.org/blog/analyzing-o3-with-arc-agi
OpenAI, Introducing GPT-5 – https://openai.com/index/introducing-gpt-5/
Associated Press, OpenAI launches GPT-5 – https://apnews.com/article/gpt5-openai-chatgpt-artificial-intelligence-d12cd2d6310a2515042067b5d3965aa1
Ars Technica, OpenAI’s GPT-4.5 arrives to mixed reviews – https://arstechnica.com/ai/2025/02/its-a-lemon-openais-largest-ai-model-ever-arrives-to-mixed-reviews/
TechRadar, OpenAI retires the last GPT-4 model from ChatGPT – https://www.techradar.com/ai-platforms-assistants/chatgpt/openai-just-quietly-retired-the-last-of-the-gpt-4-models-and-it-feels-like-the-end-of-an-ai-era

A ChatGPT consolidation, not an API shutdown

Why GPT-4.5 had a limited role

o3’s reasoning promise and its practical constraints

GPT-5 becomes the default direction

Users may notice more than benchmark changes

Editor’s Take

References

Leave a Reply Cancel reply

Related Posts

Understanding xAI’s Core Architecture: Design Principles Behind Musk’s AI Venture

DOE Selects Amentum to Negotiate 1-Gigawatt AI Data Center at Savannah River Site

OpenAI Unveils GPT-5.6 (Sol, Terra, Luna): A Deep Dive into Next-Gen AI