Rainbow

Overview

The Rainbow algorithm is an extension of DQN that combines multiple improvements:

Prioritized Experience Replay
Dueling Network Architecture
Noisy Networks
Distributional Q-Learning
N-step Learning
Double Q-Learning

Original papers:

Rainbow: Combining Improvements in Deep Reinforcement Learning

Reference resources:

Implemented Variants

Variants Implemented	Description
`rainbow_atari.py`, docs	For playing Atari games. It uses convolutional layers and common atari-based pre-processing techniques.

`rainbow_atari.py`

The rainbow_atari.py has the following features:

For playing Atari games. It uses convolutional layers and common atari-based pre-processing techniques.
Works with the Atari's pixel Box observation space of shape (210, 160, 3)
Works with the Discrete action space

Usage

poetry install -E atari
python cleanrl/rainbow_atari.py --env-id BreakoutNoFrameskip-v4
python cleanrl/rainbow_atari.py --env-id PongNoFrameskip-v4

poetrypip

poetry install -E atari
poetry run python cleanrl/rainbow_atari.py --env-id BreakoutNoFrameskip-v4
poetry run python cleanrl/rainbow_atari.py --env-id PongNoFrameskip-v4

pip install -r requirements/requirements-atari.txt
python cleanrl/rainbow_atari.py --env-id BreakoutNoFrameskip-v4
python cleanrl/rainbow_atari.py --env-id PongNoFrameskip-v4

Explanation of the logged metrics

Running python cleanrl/rainbow_atari.py will automatically record various metrics such as actor or value losses in Tensorboard. Below is the documentation for these metrics:

charts/episodic_return: episodic return of the game
charts/SPS: number of steps per second
losses/td_loss: the n-step distributional TD loss
losses/q_values: the mean Q values of the sampled data in the replay buffer
charts/beta: the beta value of the prioritized experience replay

Implementation details

rainbow_atari.py is based on (Hessel et al., 2018)¹, and uses the same hyperparameters as (Hessel et al., 2018)¹. See Table 1 in (Hessel et al., 2018)¹ for the hyperparameters. However, there are a few implementation differences:

rainbow_atari.py uses the more popular Adam Optimizer with the --learning-rate=0.0000625 as follows:

optim.Adam(q_network.parameters(), lr=0.0000625)

whereas (Hessel et al., 2018)¹ uses the RMSProp optimizer with --learning-rate=0.0000625, gradient momentum 0.95, squared gradient momentum 0.95, and min squared gradient 0.01 as follows:

optim.RMSprop(
    q_network.parameters(),
    lr=2.5e-4,
    momentum=0.95,
    # ... PyTorch's RMSprop does not directly support
    # squared gradient momentum and min squared gradient
    # so we are not sure what to put here.
)

Experiment results

To run benchmark experiments, see benchmark/rainbow.sh. Specifically, execute the following command:

benchmark/rainbow.sh
poetry install -E atari
OMP_NUM_THREADS=1 xvfb-run -a poetry run python -m cleanrl_utils.benchmark \
    --env-ids PongNoFrameskip-v4 BeamRiderNoFrameskip-v4 BreakoutNoFrameskip-v4 \
    --command "poetry run python cleanrl/rainbow_atari.py --track --capture_video" \
    --num-seeds 3 \
    --workers 1

Below are the average episodic returns for rainbow_atari.py.

	Rainbow	C51	DQN
AlienNoFrameskip-v4	2907.03 ± 355.53	1831.00 ± 98.23	1275.77 ± 65.41
AssaultNoFrameskip-v4	7661.11 ± 226.51	3322.54 ± 94.46	3845.70 ± 443.31
GopherNoFrameskip-v4	8111.07 ± 300.60	8715.60 ± 492.23	10415.53 ± 3438.12
YarsRevengeNoFrameskip-v4	63536.39 ± 5432.22	11010.99 ± 904.27	15290.12 ± 8010.56
SpaceInvadersNoFrameskip-v4	1835.52 ± 205.10	2009.05 ± 226.96	1441.68 ± 23.92
MsPacmanNoFrameskip-v4	3113.30 ± 393.00	2445.13 ± 30.16	2109.43 ± 49.85

Learning curves:

Rainbow shows better performance than C51 and DQN.

Rainbow is also more sample efficient than C51 and DQN.

Rainbow obtains better aggregated performance than C51 and DQN.

Hessel, M., Modayil, J., Hasselt, H.V., Schaul, T., Ostrovski, G., Dabney, W., Horgan, D., Piot, B., Azar, M.G., & Silver, D. (2018). Rainbow: Combining Improvements in Deep Reinforcement Learning. AAAI. ↩↩↩↩