When a Celebrity Name Stumped LLMs: The Year‑Old Insight Behind Low‑Frequency Token Degradation
A fan's test of the idol Ma Jiaqi exposed a large‑language‑model's inability to generate his name, leading to an analysis that links the failure to low‑frequency token degradation, academic papers on frequency‑aware prompting and training, and a confirming tokenizer change by Anthropic.
A fan of the Chinese idol Ma Jiaqi discovered that a large language model could correctly recount his biography and variety‑show appearances but repeatedly misspelled his name, prompting a wave of public discussion.
The technical root cause was identified as degradation of low‑frequency tokens in the model's output layer. The name token was merged into a single token that appeared frequently during pre‑training but was virtually absent in the high‑quality instruction‑following (SFT) data, causing its parameters to drift during fine‑tuning.
In 2025, the startup FaceMind (Fa ceMind) presented at the EMNLP main conference a paper titled SLoW that systematically described the low‑frequency word problem and introduced Dictionary‑based Prompting , which injects frequency‑aware entries into prompts without modifying model weights. In April 2026, the same team released a follow‑up paper Adam's Law (accepted as an oral presentation at ACL 2026) that extended the analysis to the sentence level, formulating the Textual Frequency Law (TFL) with a formal proof based on Zipf's law and proposing a distillation method and curriculum‑learning framework.
Four key milestones form a causal chain: (1) EMNLP 2025 SLoW paper, (2) April 2 2026 arXiv pre‑print of Adam's Law, (3) late April 2026 Anthropic’s Claude Opus 4.7 tokenizer overhaul, and (4) May 9 2026 the public “Ma Jiaqi” incident. This timeline shows academic discovery first, industry validation second, and public awareness third.
Anthropic’s tokenizer change increased token consumption by roughly 1.0–1.35× for English and code text (1.20–1.47×) while leaving CJK token counts almost unchanged (≈1.01×). The community interpreted this as merging or removing low‑frequency tokens, a move that aligns closely with FaceMind’s academic recommendation to intervene on low‑frequency token degradation.
SLoW (EMNLP 2025) demonstrates that low‑frequency words suffer from systematic comprehension and generation deficits, which can be mitigated by adding a dictionary layer to the prompt. For the Ma Jiaqi case, such prompting can resolve input‑layer issues, whereas degradation that has already entered the generation head requires training‑side data augmentation or synthetic data repair.
Adam's Law (ACL 2026 Oral) asserts that, given unchanged semantics, sentences with higher textual frequency yield better model performance in both prompting and fine‑tuning scenarios. Experiments reported include:
Prompting layer: rewriting inputs to high‑frequency forms raised DeepSeek‑V3 math accuracy from 63.55 % to 71.54 % and LLaMA‑3.3‑70B from 80.49 % to 88.75 %.
Machine translation: BLEU scores improved in 99 of 100 language pairs, with 63 pairs gaining >1 point and 12 pairs gaining >5 points.
Training layer: the CTFT method delivered up to ~30 % relative BLEU improvement on low‑resource language pairs; ablation of the TFD component confirmed that frequency‑based correction alone provides stable gains.
The paper also validated the approach on tasks beyond text generation, including mathematical reasoning, common‑sense reasoning, agent tool‑calling, and nearly a hundred language‑pair translation tasks.
The analysis highlights that the industry’s current focus on word‑level token issues leaves the sentence‑level frequency law largely untapped. Zipf's law (1949) states that roughly 20 % of words account for 80 % of usage, meaning the long tail of low‑frequency tokens cannot be eliminated without harming the model’s ability to handle diverse text. FaceMind’s combined subtractive (prompting, tokenizer tweaks) and additive (frequency distillation, curriculum training) strategies constitute a comprehensive toolbox for addressing both levels of the problem.
In summary, FaceMind’s early academic work accurately predicted a direction later confirmed by Anthropic’s production‑grade tokenizer change, yet the industry has yet to adopt the sentence‑level methods proposed in Adam's Law, representing an untapped academic gold mine for large‑language‑model research.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
