Tag

GSM‑Symbolic

0 views collected around this technical thread.

Cognitive Technology Team
Cognitive Technology Team
Oct 16, 2024 · Artificial Intelligence

Large Language Models Lack Formal Reasoning Ability: Five Pieces of Evidence from the GSM‑Symbolic Benchmark

Recent research by Apple’s Iman Mirzadeh team introduces the GSM‑Symbolic benchmark, revealing that large language models, despite high scores on GSM8K, exhibit significant performance drops when problem numbers, names, or extra clauses change, indicating a lack of true formal reasoning ability.

AI SafetyGSM‑Symbolicbenchmark
0 likes · 9 min read
Large Language Models Lack Formal Reasoning Ability: Five Pieces of Evidence from the GSM‑Symbolic Benchmark