In last article, we have discussed Q-learning and we have seen its desirable convergence attributes. Never the less, Q-learning has one fundamental limitation preventing it from being applicable to more complex RL tasks. During learning, Q-learning keeps the Q-value for every state-action pair. In FrozenLake with 4x4 grid, there are 4 actions leading to Q-table size of 4x4x4 = 64. Size of Q-table can grows linearly proportional to number of states. states. This becomes limiting very quickly for RL tasks with much larger states domain size.

Usage of Neural Networks

So clearly we need a better way to approximate Q-value function that does not…

In the last article I have explained generalized policy iteration process and described our first reinforcement learning algorithm: Mote Carlo. In this article we will discuss the drawbacks of Monte Carlo and explore two other algorithms that can help the agent overcome shortcomings of Monte Carlo.

Monte Carlo algorithm learns from complete episodes. This can have the following drwabacks:

  • Monte Carlo cannot be used for continuous tasks.
  • Monte Carlo can be very slow for environments with long episodes.


RL algorithms that can update Q-value estimates without having to wait for complete episodes are called Temproal difference (TD). SARSA algorithm is…

In the last article, I have introduced Reinforcement learning Markov Decision Process (MDP) framework, discounted expected rewards and value and policy functions definitions. In this article, we will continue the definition of the MDP framework explaining Bellman and Bellman optimality equations. Additionally we will have describe our first reinforcement learning algrithm: Monte Carlo. So let us start…

Bellman equations

In the last article, value and Q functions were defined as:


Reinforcement learning is an important type of machine learning used in vast range of applications and fields including robotics, genetics, financial applications and recommendation systems to mention a few. In this series of articles, I aim at taking the reader into a journey to learn enough about this topic. The goal is to build knowledge in reinforcement learning starting from basic principles and gradually get to more advanced aspects of reinforcement learning. The articles will have a balance theory and practical demos which can help to practice theory learnt and cement understanding. So let us start the journey…


Reinforcement learning…

Ahmed El-Khouly

Technical lead of IBM Cognos recommenders system

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store