Reinforcement Learning Digest Part 2: Bellman Equations, Generalized Policy Iteration and Monte Carlo

In the last article, I have introduced Reinforcement learning Markov Decision Process (MDP) framework, discounted expected rewards and value and policy functions definitions. In this article, we will continue the definition of the MDP framework explaining Bellman and Bellman optimality equations. Additionally we will have describe our first reinforcement learning algrithm: Monte Carlo. So let us start…

Bellman equations