Reinforcement Learning Digest Part 4: Deep Q-Network(DQN) and Double Deep Q-Networks(DDQN)
In last article, we have discussed Q-learning and we have seen its desirable convergence attributes. Never the less, Q-learning has one fundamental limitation preventing it from being applicable to more complex RL tasks. During learning, Q-learning keeps the Q-value for every state-action pair. In FrozenLake with 4x4 grid, there are 4 actions leading to Q-table size of 4x4x4 = 64. Size of Q-table can grows linearly proportional to number of states. states. This becomes limiting very quickly for RL tasks with much larger states domain size.
Usage of Neural Networks
So clearly we need a better way to approximate Q-value function that does not have memory requirements that are directly proportional to state domain size. Neural Networks are known to learn approximating functions. It seems natural to use NN to learn Q-value function with Q-target:
During back-propagation, NN weights will be updated according to the following rule:
Experience Replay Memory
During training the agent will continuously interact with the environment to get new experiences which will be used to train NN to learn Q-value function. During long training cycles, the NN weights will get continuously updated causing the NN to eventually forget about old experiences. Additionally, as experiences are collected in sequences from consecutive time steps, the NN can…