Artificial Intelligence 8 min read

OpenAI Launches o3-mini: A Fast, Cost‑Effective AI Model Optimized for STEM Reasoning

OpenAI unveiled the o3-mini family—low, medium, and high variants—offering a cheaper, faster, and secure inference model that matches or exceeds the performance of its predecessor o1 across STEM, coding, and general knowledge benchmarks while introducing search integration and enhanced safety features.

Top Architect
Top Architect
Top Architect
OpenAI Launches o3-mini: A Fast, Cost‑Effective AI Model Optimized for STEM Reasoning

OpenAI announced the release of the o3-mini series (low, medium, high), positioning it as the latest, most cost‑effective inference model now available on ChatGPT and via API.

The new models are accessible to ChatGPT Plus, Team, and Pro users immediately, with enterprise access slated for the coming week; free users can also try o3-mini through the "Reasoning" option.

While o3-mini does not support visual tasks (requiring OpenAI o1 for such use), it offers a higher daily message limit (150 vs. 50) and integrates a search function that provides up‑to‑date answers with source links.

Performance evaluations show o3-mini‑medium achieving parity with o1 in mathematics, programming, and scientific reasoning, delivering faster responses (average latency 7.7 s vs. 10.16 s for o1‑mini) and a 24% speed advantage. Expert testers preferred o3-mini’s answers in 56% of cases and observed a 39% reduction in major errors on difficult real‑world problems.

Benchmark results include:

Competitive math (AIME 2024): o3-mini‑low matches o1‑mini, o3-mini‑medium matches o1, and o3-mini‑high surpasses both.

Doctor‑level science (GPQA Diamond): o3-mini‑low outperforms o1‑mini; o3-mini‑high matches o1 on biology, chemistry, and physics.

FrontierMath research‑level math: o3-mini‑high solves >32% of problems on first try, including >28% of the hardest (T3) tasks.

Codeforces programming contests: o3-mini scores increase with model tier, all exceeding o1‑mini; medium matches o1.

SWE‑bench Verified software engineering: o3-mini‑high reaches 39% accuracy with open‑source agents and 61% with internal tools, the best OpenAI model to date.

LiveBench coding: even o3-mini‑medium outperforms o1‑high, with o3-mini‑high further extending the lead.

General knowledge: o3-mini consistently outperforms o1‑mini across domains.

Safety assessments reveal that o3-mini employs deliberative alignment, training the model to reason over human‑written safety guidelines before responding, and it surpasses GPT‑4o on challenging safety and jailbreak tests, benefitting from the same red‑team and evaluation processes used for o1.

Looking ahead, OpenAI emphasizes that o3-mini continues its trend of reducing token costs (95% cheaper than GPT‑4) while maintaining high‑quality reasoning, aiming to make powerful AI more accessible and balanced in terms of intelligence, efficiency, and safety.

OpenAIAI modelAI safetymodel performanceO3-miniSTEM reasoning
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.