OpenAI o3 Price Drops, Announces New Flex Mode

The OpenAI o3 price was previously a barrier for many who wanted to leverage its advanced reasoning capabilities, but now, with the new pricing, accessibility is greatly enhanced. OpenAI’s recent announcement about an 80% price drop for its o3 model shifts how developers, startups, and enterprises approach large language models (LLMs).

Understanding the OpenAI o3 Price Drop

At its core, this move reflects OpenAI’s strategy to democratize access to high-performance AI models. As Sam Altman, CEO of OpenAI, mentioned on X, “we dropped the price of o3 by 80%!! excited to see what people will do with it now.” This change positions the model as more competitive against rivals like Google DeepMind’s Gemini 2.5 Pro or Anthropic’s Claude Opus 4. As OpenAI continues refining its offerings—highlighted further by the upcoming launch of o3 Pro—the landscape is evolving towards more affordable, yet powerful AI tools that cater to a broader audience.

What is the new pricing structure?

Before this adjustment, using the o3 model cost $10 per million input tokens and $40 per million output tokens—pretty steep for many developers and organizations. Now, those rates have plummeted to just $2 per million input tokens and $8 per million output tokens, drastically reducing operational costs. Additionally, there’s an extra discount of $0.50 per million tokens when users utilize cached data—meaning if you store and reuse information that hasn’t changed, your expenses drop even further.

This shift isn’t just about saving money; it’s about enabling experimentation at scale without breaking the bank. For example, a developer working on complex reasoning tasks can now prototype or deploy solutions with much less financial friction than before. The new pricing aligns OpenAI‘s flagship model more closely with other cost-effective options in the market while maintaining its edge in performance.

How does the 80% reduction impact users?

The impact on users is multifaceted. First off: affordability opens doors for small startups and independent developers who previously found access prohibitive due to high costs. With prices lowered by 80%, integrating advanced reasoning capabilities into products becomes feasible without massive budgets.

Second: broader adoption encourages innovation across various industries—from healthcare diagnostics to financial modeling—since experimenting with cutting-edge models no longer requires significant capital investment. This democratization also accelerates research progress because more teams can test ideas rapidly and iterate freely.

Moreover, existing users benefit from reduced operational expenses when scaling their applications or conducting large-scale testing during development phases. The reduced prices promote wider deployment scenarios where cost-efficiency is crucial—think real-time chatbots or extensive content analysis pipelines.

Comparison with previous costs

Model	Previous Cost (per million tokens)	Current Cost (per million tokens)	Percentage Drop
OpenAI o3	$10 (input), $40 (output)	$2 (input), $8 (output)	80%
Gemini 2.5 Pro	~$1.25–$2.50 (input), $10–$15 (output)	N/A	N/A
Claude Opus 4	$15 (input), $75 (output)	Same	—
DeepSeek-Reasoner	~$0.14–$0.55 (input), ~$1.10–$2.19(output)	Lower during off-peak hours	Variable

This table highlights how dramatic the price cut truly is when compared not only internally but also against competitors like Google’s Gemini series or Anthropic.

Introducing OpenAI’s Flex Mode: Synchronous Processing Made Easier

Alongside this major pricing update comes an innovative feature called Flex Mode, designed specifically for workloads requiring synchronous processing with predictable costs and latency controls.

What is Flex Mode and how does it work?

Flex Mode introduces a flexible way for developers to handle real-time tasks where timing matters—a common scenario in customer support bots or live assistance systems. Instead of paying per token based on usage patterns alone, Flex Mode charges a flat rate: $5 for each input processing batch and $20 for each output generation, calculated per one million tokens involved.

This setup allows fine-tuning between cost efficiency and latency needs depending on workload characteristics:

For quick turnaround projects where predictability matters.
When recurrent operations involve similar inputs that can be cached.

It essentially offers a more predictable billing structure while preserving access to high-quality reasoning models like o3-pro or future iterations.

Pricing details for Flex Mode: $5 per input, $20 per output per million tokens

Here’s a quick breakdown:

Input processing in Flex Mode costs $5/million tokens, which covers all input data sent to the model.
Output generation runs at $20/million tokens, encompassing responses generated by the system.

For context:

If you process 1 million tokens as input and generate 1 million as output within one session, your total would be approximately $25.
For smaller tasks or batch processing involving cached data or partial reuse strategies, expenses decrease further due to caching discounts similar to those in standard API use cases.

This arrangement targets use cases demanding real-time responses combined with strict budgets—like interactive applications needing minimal latency combined with manageable costs.

Use cases and benefits of Flex Mode

Flex Mode shines particularly when:

Real-time decision-making systems are essential; think chatbots providing instant support.
Applications require predictable cost management over fluctuating workloads.
Developers want simplified billing without monitoring variable token counts constantly.

Its flexibility makes it ideal for deploying sophisticated AI features at scale without sacrificing budget control—a critical factor especially during early-stage prototyping or rapid iteration cycles.

Some specific benefits include:

Cost transparency helps project managers forecast expenses accurately.
Improved control over latency versus cost trade-offs.
Ability to combine flexible billing with advanced reasoning capabilities provided by models like o3-pro.

Platforms focusing on enterprise solutions find this mode attractive because it balances performance demands with budget constraints effectively from initial development through production deployment (source).

In summary: The dual announcement of an openly reduced openai o3 price alongside a versatile Flex Mode introduces new opportunities across multiple sectors seeking affordable yet powerful AI solutions [source]. Developers now have compelling reasons—and more economical tools—to experiment broadly while maintaining precise control over resource consumption and costs in real time environments.

Frequently asked questions on OpenAI o3 price

How does the new Flex Mode impact costs for real-time applications?

Flex Mode introduces a predictable billing structure where users pay $5 per million tokens for input processing and $20 per million tokens for output generation. This setup simplifies budgeting for real-time tasks like chatbots or live support systems by offering fixed rates, making managing expenses easier while maintaining high-quality AI performance.

How do the new prices compare to previous costs of the OpenAI o3 model?

Before the update, using the OpenAI o3 model cost $10 per million input tokens and $40 per million output tokens—costs now reduced by 80% to just $2 and $8 respectively. This massive price cut significantly lowers barriers for deploying large language models at scale.

How does Flex Mode work in terms of pricing?

Flex Mode charges a flat rate—$5 per million tokens for input processing and $20 per million tokens for output generation—providing a straightforward way to manage costs in real-time environments requiring quick responses and consistent latency.

Can I combine cached data with Flex Mode to reduce expenses?

Certain discounts apply when you reuse or cache data during processing, which can further lower your overall costs within Flex Mode. This feature is especially handy if your application involves repetitive inputs or outputs that don’t change often.

Is Flex Mode suitable for all types of AI workloads?

No, it’s best suited for scenarios demanding predictable latency and steady costs—like customer support bots or live interactive systems—where timing consistency matters more than variable token usage-based billing.

Table of Contents