DeepSeek’s MCTS Failure: The ‘Roast Chicken and Baijiu’ Dilemma in LLM Training
The article examines why DeepSeek’s large‑model training cannot yet leverage Monte‑Carlo Tree Search, detailing its reliance on SFT, GRPO‑driven CoT activation and rejection‑sampling, contrasting this with Google’s PRM‑based approaches, and proposing a MCTS‑powered data‑generation pipeline to overcome the “roast chicken and baijiu” training dilemma.
