Meituan Technology Team
May 14, 2026 · Artificial Intelligence
General 365: Meituan LongCat’s Open‑Source Benchmark Redefines LLM Reasoning Evaluation
The General 365 benchmark, built from 365 original seed questions and 1,095 variants across eight reasoning challenges, reveals that most mainstream large language models struggle with everyday logical tasks, achieving at most 62.8% accuracy and requiring far more tokens than on traditional subject‑specific tests.
AI reasoningGeneral 365LLM evaluation
0 likes · 9 min read
