Fundamentals and Policy Gradient Algorithms in Reinforcement Learning with Applications to Scene Text Recognition
This article introduces the basic concepts of reinforcement learning, derives model‑based and model‑free policy gradient methods—including vanilla policy gradient and Actor‑Critic—explains their mathematical foundations, and demonstrates their use in scene text recognition and image captioning tasks.