20‑Year‑Old Transformer Co‑author Open‑Sources a 218‑Billion‑Parameter Model
Cohere’s Command A+ model, built by Transformer co‑author Aidan Gomez and backed by Nick Frosst, packs 218 billion parameters but activates only 25 billion at inference, uses a lossless 4‑bit quantization scheme, offers native citation support, runs on a single B200 or two H100 GPUs, and is released under an Apache 2.0 license, marking a major shift toward truly open‑source, enterprise‑ready large language models.
The seminal paper “Attention Is All You Need” sparked the era of large language models. On May 20, Aidan Gomez—one of the paper’s youngest co‑authors and now Cohere’s co‑founder and CEO—announced the first fully open‑source model under an Apache 2.0 license: Cohere Command A+.
Command A+ is the final model in the Command A family and Cohere’s first mixture‑of‑experts (MoE) model. It contains 218 billion total parameters but only 25 billion activation parameters are engaged per inference step, embodying the core advantage of MoE: routing each query to a few specialised expert networks while keeping the rest dormant, thus preserving giant‑model knowledge with the compute cost of a much smaller model.
According to VentureBeat, contemporary models such as OpenAI’s GPT‑5.5 and Anthropic’s Claude Opus have trillion‑scale parameters, yet Command A+ activates merely 250 million parameters. Cohere further reduces compute by applying a second compression layer: quantization. The model ships in three precision variants—BF16, FP8, and a highly compressed W4A4 version, the latter being the technical centerpiece.
Typical quantization incurs a “quantization tax” that degrades performance on complex tasks. Cohere’s approach quantises only the MoE experts to 4‑bit while keeping the attention pathways in full precision, and adds Quantization‑Aware Distillation. Cohere claims the W4A4 variant is near‑lossless; benchmark data shows it processes 375 tokens per second at low concurrency with a first‑token latency of 113 ms.
These efficiencies enable Command A+ to run on a single NVIDIA B200 or two H100 GPUs, a stark contrast to the GPU clusters required for earlier hundred‑billion‑parameter models.
The model also introduces native citation (grounding spans). When retrieving external information, Command A+ inserts special tags that link each factual statement directly to the source document or database row, dramatically reducing hallucination risk—crucial for regulated domains such as finance, healthcare, and law.
Beyond text, Command A+ supports multimodal inputs within a 128 K context window, handling both text and images, making it suitable for tasks like invoice analysis or technical manual parsing.
Licensing marks a decisive shift. Previous Cohere releases used CC‑BY‑NC 4.0, restricting commercial use. Command A+ adopts the OSI‑approved Apache 2.0 license, allowing unrestricted modification, distribution, and commercialisation without royalty or non‑compete clauses. This change, driven by co‑founder Nick Frosst, frees enterprises from vendor lock‑in.
Performance benchmarks released by Cohere (cited by VentureBeat) show substantial gains over the predecessor Command A Reasoning: on the ²‑Bench Telecom suite, accuracy jumps from 37 % to 85 %; on Terminal‑Bench Hard, from 3 % to 25 %; and on the AIME‑25 math test, from 57 % to 90 %.
VentureBeat notes that while Command A+ matches larger models in pure reasoning and mathematics, it still trails Chinese open‑source leaders such as DeepSeek in broader agent‑coding and general intelligence tasks.
Strategically, Cohere’s hardware‑efficient architecture, lossless quantisation, and open licensing lower deployment costs, reduce inference overhead (W4A4 improves speed by up to 63 % and cuts latency by 17 % versus the prior model), and shrink token usage for non‑Latin scripts (Arabic tokens ↓20 %, Japanese ↓18 %, Korean ↓16 %). These factors collectively make enterprise‑grade AI more affordable.
Recent corporate moves include Cohere’s merger with German AI firm Aleph Alpha, reinforcing a focus on on‑premise AI solutions for governments and large enterprises rather than consumer chatbots.
Overall, the article argues that the open‑source LLM race has entered a second phase: after competing on parameter count, the decisive battle now centres on enabling organisations to run cutting‑edge models in‑house with manageable cost and full ownership.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Machine Learning Algorithms & Natural Language Processing
Focused on frontier AI technologies, empowering AI researchers' progress.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
