Artificial Intelligence 7 min read

Google Unveils Gemini: A New Multimodal Large Model Family (Ultra, Pro, Nano)

Google announced Gemini, a suite of multimodal large language models—including Ultra, Pro, and Nano—that achieve state‑of‑the‑art results on dozens of benchmarks, support native multimodal pre‑training, and are being integrated across Google products such as Bard, Search, and upcoming Pixel devices.

Rare Earth Juejin Tech Community

Dec 9, 2023

Google Unveils Gemini: A New Multimodal Large Model Family (Ultra, Pro, Nano)

Google announced Gemini, its latest multimodal large model family comprising three versions: Gemini Ultra (the largest and most capable), Gemini Pro (scalable for many tasks), and Gemini Nano (efficient for edge devices).

Gemini Ultra achieved 30 out of 32 benchmark SOTA results, including the first model to reach human‑expert level on the MMLU benchmark, and excelled on the multimodal MMMU benchmark with a 59.4% SOTA score.

Gemini Pro is being rolled out today within Google Bard, while Gemini Nano will power the upcoming Pixel 8 Pro smartphone.

Google released a 60‑page technical report detailing the model’s native multimodal pre‑training approach, which trains on multiple modalities from the start and fine‑tunes with additional multimodal data, improving reasoning and cross‑modal understanding.

The architecture builds on an enhanced Transformer decoder with efficient attention mechanisms (e.g., multi‑query attention) and supports up to 32 k context length; training leveraged TPUv5e and TPUv4 hardware across multiple data centers.

"This training method enables Gemini to seamlessly understand and reason about diverse inputs, surpassing existing multimodal models in virtually every domain," Google said.

Use‑case demonstrations show Gemini extracting data from thousands of scientific papers, assisting students with handwritten problem solving, and performing image‑based reasoning such as identifying movies from combined visual cues.

Google plans to integrate Gemini across its ecosystem—including Search, Ads, Chrome, and Duet AI—while also introducing a new generation of TPUs (Cloud TPU v5p) to support the model’s scale.

Overall, Gemini represents Google’s effort to catch up with OpenAI’s GPT series by delivering a powerful, native multimodal AI platform for both cloud services and edge devices.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Artificial Intelligence Large Language Model benchmark Gemini Google AI Multimodal

Written by

Rare Earth Juejin Tech Community

Juejin, a tech community that helps developers grow.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.