Tagged articles
5 articles
Page 1 of 1
Data Party THU
Data Party THU
May 4, 2026 · Artificial Intelligence

Understanding the Mathematical Foundations of Reinforcement Learning

This article provides a concise overview of a ten‑chapter reinforcement‑learning textbook, outlining the progression from basic concepts such as states and rewards to advanced algorithms like policy gradients and actor‑critic methods, and explains how each chapter builds on the previous ones.

Bellman equationMonte CarloPolicy Gradient
0 likes · 11 min read
Understanding the Mathematical Foundations of Reinforcement Learning
Data Party THU
Data Party THU
Oct 22, 2025 · Artificial Intelligence

Demystifying Large‑Model Reinforcement Learning: From MDP Basics to Bellman and Advantage Functions

This article provides a comprehensive introduction to reinforcement learning for large language models, covering the Markov Decision Process formulation, the four core elements of RL, state‑value and action‑value functions, Bellman equations, and the advantage function that underpins modern policy‑gradient algorithms.

AI fundamentalsBellman equationLarge Language Model
0 likes · 13 min read
Demystifying Large‑Model Reinforcement Learning: From MDP Basics to Bellman and Advantage Functions
Didi Tech
Didi Tech
Aug 28, 2025 · Artificial Intelligence

Why Temporal Difference Beats Monte Carlo: Mastering the Bellman Equation

Explore how the Bellman equation underpins reinforcement learning, comparing Dynamic Programming, Monte Carlo, and Temporal‑Difference methods, and discover why TD’s low‑variance, online updates make it a powerful bridge between model‑based planning and sample‑based learning.

Bellman equationMonte CarloQ-Learning
0 likes · 21 min read
Why Temporal Difference Beats Monte Carlo: Mastering the Bellman Equation
AI Algorithm Path
AI Algorithm Path
May 19, 2025 · Artificial Intelligence

Understanding Policy Evaluation and Improvement in Reinforcement Learning

This article explains how to solve Bellman equations, use iterative policy‑evaluation methods, apply the policy‑improvement theorem, and combine both steps in policy iteration, value iteration, and asynchronous variants, illustrated with a 5‑state example and a 4×4 gridworld.

Bellman equationGridWorldgeneralized policy iteration
0 likes · 15 min read
Understanding Policy Evaluation and Improvement in Reinforcement Learning
AI Algorithm Path
AI Algorithm Path
May 18, 2025 · Artificial Intelligence

Reinforcement Learning Tutorial Part 1: Core Concepts Explained

This article introduces the fundamental concepts of reinforcement learning, covering the agent‑environment interaction, key terminology, reward structures, task types, policies, value functions, the Bellman equations, and how optimal strategies are derived and approximated in practice.

Bellman equationMarkov Decision ProcessOptimal Policy
0 likes · 13 min read
Reinforcement Learning Tutorial Part 1: Core Concepts Explained