Artificial Intelligence 7 min read

Google Unveils Gemini: A New Multimodal Large Model Family (Ultra, Pro, Nano)

Google announced Gemini, a suite of multimodal large language models—including Ultra, Pro, and Nano—that achieve state‑of‑the‑art results on dozens of benchmarks, support native multimodal pre‑training, and are being integrated across Google products such as Bard, Search, and upcoming Pixel devices.

Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Google Unveils Gemini: A New Multimodal Large Model Family (Ultra, Pro, Nano)

Google announced Gemini, its latest multimodal large model family comprising three versions: Gemini Ultra (the largest and most capable), Gemini Pro (scalable for many tasks), and Gemini Nano (efficient for edge devices).

Gemini Ultra achieved 30 out of 32 benchmark SOTA results, including the first model to reach human‑expert level on the MMLU benchmark, and excelled on the multimodal MMMU benchmark with a 59.4% SOTA score.

Gemini Pro is being rolled out today within Google Bard, while Gemini Nano will power the upcoming Pixel 8 Pro smartphone.

Google released a 60‑page technical report detailing the model’s native multimodal pre‑training approach, which trains on multiple modalities from the start and fine‑tunes with additional multimodal data, improving reasoning and cross‑modal understanding.

The architecture builds on an enhanced Transformer decoder with efficient attention mechanisms (e.g., multi‑query attention) and supports up to 32 k context length; training leveraged TPUv5e and TPUv4 hardware across multiple data centers.

"This training method enables Gemini to seamlessly understand and reason about diverse inputs, surpassing existing multimodal models in virtually every domain," Google said.

Use‑case demonstrations show Gemini extracting data from thousands of scientific papers, assisting students with handwritten problem solving, and performing image‑based reasoning such as identifying movies from combined visual cues.

Google plans to integrate Gemini across its ecosystem—including Search, Ads, Chrome, and Duet AI—while also introducing a new generation of TPUs (Cloud TPU v5p) to support the model’s scale.

Overall, Gemini represents Google’s effort to catch up with OpenAI’s GPT series by delivering a powerful, native multimodal AI platform for both cloud services and edge devices.

Artificial Intelligencelarge language modelbenchmarkGeminiGoogle AImultimodal
Rare Earth Juejin Tech Community
Written by

Rare Earth Juejin Tech Community

Juejin, a tech community that helps developers grow.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.