Artificial Intelligence 11 min read

Apple Intelligence: Inside the New Apple Foundation Model

Apple Intelligence, an on‑device AI suite debuting with iOS 18.1 beta, centers on the Apple Foundation Model—a 3‑billion‑parameter on‑device LLM (and a larger undisclosed cloud version) trained on TPUs with novel RL algorithms and mixed‑precision quantization, delivering Siri, writing assistance, photo search, and benchmark performance that surpasses GPT‑4, though currently limited to paid developers.

Java Tech Enthusiast

Aug 1, 2024

Apple Intelligence: Inside the New Apple Foundation Model

Apple has launched Apple Intelligence, an on‑device AI suite available to developers with the iOS 18.1 beta. The rollout includes a revamped Siri that supports both voice and text interaction, a writing assistant that can polish tweets and comments, and a photo search feature powered by natural‑language queries.

The core of Apple Intelligence is the Apple Foundation Model (AFM), a family of large language models. The on‑device version has roughly 3 B parameters, while a larger cloud version is kept undisclosed. Both models use a 32 k context window.

AFM is trained exclusively on Google TPU hardware (8192 TPUv4 chips for the cloud model and 2048 TPUv5p chips for the on‑device model), with no Nvidia GPUs involved. Training is performed with Apple’s JAX‑based AXLearn framework, employing tensor‑parallelism and pipeline‑parallelism.

Data for pre‑training comes from Applebot‑crawled web pages and publicly licensed code and math datasets, all under permissive licenses (MIT, Apache, CC0). The training pipeline consists of three stages: core training (6.3 T tokens, 4096‑token window), continued training (1 T tokens, 8192‑token window) and context‑extension (up to 32 k tokens, 100 B tokens).

After pre‑training, AFM undergoes supervised fine‑tuning (SFT) and reinforcement learning from human feedback (RLHF). Apple introduced two novel RL algorithms: iTeC (Iterative Teaching Committee) and MDLOO (online RL with leave‑one‑out estimation), which combine preference‑based optimization, DPO, and online policy updates.

For on‑device efficiency, Apple applies a mixed‑precision quantization “palette” strategy: projection weights share 4‑bit constants per 16‑column group, embeddings use 8‑bit per‑channel quantization, and less critical layers are compressed to 2‑bit. Accuracy‑Recovery Adapters are added to mitigate quantization loss.

Evaluation shows AFM surpasses GPT‑4 on several instruction‑following and summarization benchmarks, achieving SOTA on IF‑Eval and strong results on AlpacaEval, GSM8K, and MATH. Safety tests indicate lower violation rates under adversarial prompts compared to other models.

Access to Apple Intelligence is limited to registered developers (US$99/year) on devices with M‑series or A17 Pro chips, and requires US regional settings and English language configuration. The full feature set is expected to roll out later, with the public release possibly delayed.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI Large Language Model Apple Intelligence model training Siri

Written by

Java Tech Enthusiast

Sharing computer programming language knowledge, focusing on Java fundamentals, data structures, related tools, Spring Cloud, IntelliJ IDEA... Book giveaways, red‑packet rewards and other perks await!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.