In last article, we have discussed Q-learning and we have seen its desirable convergence attributes. Never the less, Q-learning has one fundamental limitation preventing it from being applicable to more complex RL tasks. During learning, Q-learning keeps the Q-value for every state-action pair. In FrozenLake with 4x4 grid, there are…