OpenAI Unveils GPT‑4o: An Omni‑Capable Multimodal Model Offered Free to All Users
OpenAI introduced GPT‑4o, a free, omni‑capable multimodal model that processes text, audio, and images together, delivers near‑human response latency, showcases impressive live demos, and will soon be available via a discounted API, marking a significant step forward in end‑to‑end AI research.
OpenAI announced its latest flagship model, GPT‑4o, which is free for all users and combines text, audio, and image inputs to generate corresponding outputs, embodying the meaning of “Omni” (全能) in its name.
The model can respond to audio within as little as 232 ms (average 320 ms), matching human conversational speed, and supports seamless voice, vision, and text interactions without noticeable delay.
GPT‑4o’s capabilities are being demonstrated live, showing it can sense breathing rhythm, use richer tones, interrupt, and even engage in real‑time video‑call‑like conversations.
All of ChatGPT Plus features—including vision, web browsing, memory, code execution, and GPT Store—are now available for free, and an API will be offered at a 50 % discount with double the request speed.
During the launch, CTO Mira Murati and President Brockman presented live demos, including a translation scenario where the model acted as a real‑time interpreter between English and Italian, and a playful interaction where two instances of ChatGPT (one legacy, one new with visual abilities) conversed and sang together.
The demo highlighted GPT‑4o’s end‑to‑end training: unlike the previous three‑stage pipeline (speech‑to‑text → GPT‑3.5/4 → text‑to‑speech), the new model processes all modalities within a single neural network, eliminating the 2.8‑second (GPT‑3.5) and 5.4‑second (GPT‑4) latencies of the old system.
Benchmarks show GPT‑4o surpasses specialized models such as Whisper‑V3 in speech translation and outperforms Gemini 1.0 Ultra and Claude Opus in visual understanding.
A quoted scholar noted that a successful demo like this equates to the impact of a thousand papers.
The article also reminds readers of the upcoming Google I/O conference on May 15 and hints at further OpenAI announcements in the near future.
Rare Earth Juejin Tech Community
Juejin, a tech community that helps developers grow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.