Can Small Models Overthink? TaH Skips 93% Redundant Iterations and Boosts Accuracy

TaH, a selective latent‑iteration method for small language models, identifies and avoids unnecessary token‑level loops, cutting about 93% of extra iterations while delivering a stable 3.0%‑6.8% accuracy gain across nine math, QA, and code benchmarks.

LLM reasoningLooped TransformerTaH

0 likes · 14 min read

Can Small Models Overthink? TaH Skips 93% Redundant Iterations and Boosts Accuracy

Alibaba Cloud Developer

Nov 18, 2025 · Artificial Intelligence

How ReAct and Reflexion Boost Large Language Models for Complex, Real‑World Tasks

The article explains the limitations of large language models on multi‑step reasoning, real‑time information retrieval, and planning, then introduces the ReAct (Reasoning + Acting) framework and its Reflexion extension, detailing their mechanisms, examples, performance gains, practical applications, and future research directions.

LLM reasoningReActReflexion

0 likes · 16 min read

How ReAct and Reflexion Boost Large Language Models for Complex, Real‑World Tasks

HyperAI Super Neural

Sep 30, 2025 · Artificial Intelligence

OnePiece: Applying LLM‑Style Reasoning to Item‑ID Sequences for Generative Recommendation

The article presents the OnePiece framework, which injects LLM‑style context engineering and latent reasoning into item‑ID based search‑and‑recommendation models, details the design choices, training tricks, attention analysis, and reports online gains of around 1% GMV and ad revenue, offering a thorough technical dissection of generative recommendation in industrial settings.

Context EngineeringGenerative RecommendationLLM reasoning

0 likes · 31 min read

OnePiece: Applying LLM‑Style Reasoning to Item‑ID Sequences for Generative Recommendation

Tencent Technical Engineering

Feb 21, 2025 · Artificial Intelligence

DeepSeek-R1: Enhancing Reasoning Capabilities in LLMs via Reinforcement Learning

DeepSeek‑R1 demonstrates that large‑scale reinforcement learning, especially with the novel Group Relative Policy Optimization and a rule‑based reward scheme, can markedly boost reasoning in LLMs without heavy supervised fine‑tuning, while a brief cold‑start SFT phase, two‑stage alignment, and knowledge distillation further improve performance and efficiency, despite remaining challenges such as language mixing.

DeepSeek-R1GRPOLLM reasoning

0 likes · 21 min read

DeepSeek-R1: Enhancing Reasoning Capabilities in LLMs via Reinforcement Learning

Architect

Feb 6, 2025 · Artificial Intelligence

DeepSeek‑R1: Reinforcement‑Learning‑Driven Long‑Chain Reasoning for Large Language Models

The article reviews DeepSeek‑R1, detailing its reinforcement‑learning‑based training pipeline that uses minimal supervised data, cold‑start fine‑tuning, multi‑stage RL, rejection‑sampling SFT, and distillation to achieve reasoning performance comparable to OpenAI‑o1‑1217, while also discussing successful contributions and failed experiments.

AI researchDeepSeek-R1LLM reasoning

0 likes · 11 min read

DeepSeek‑R1: Reinforcement‑Learning‑Driven Long‑Chain Reasoning for Large Language Models

Baobao Algorithm Notes

Oct 11, 2024 · Artificial Intelligence

How Does OpenAI’s o1 Achieve Self‑Correction? A Deep Dive into MCTS and SCoRe

Examining OpenAI’s o1 model, this article explores its self‑correction capability by linking test‑time scaling, MCTS‑style reasoning, and DeepMind’s SCoRe reinforcement‑learning framework, illustrating step‑by‑step demos, hypothesizing internal judgment mechanisms, and proposing training pipelines that combine self‑generated data with post‑training RL.

LLM reasoningMCTSOpenAI

0 likes · 12 min read

How Does OpenAI’s o1 Achieve Self‑Correction? A Deep Dive into MCTS and SCoRe