Skip to content

Rainbow

Overview

The Rainbow algorithm is an extension of DQN that combines multiple improvements:

  • Prioritized Experience Replay
  • Dueling Network Architecture
  • Noisy Networks
  • Distributional Q-Learning
  • N-step Learning
  • Double Q-Learning

Original papers:

Reference resources:

Implemented Variants

Variants Implemented Description
rainbow_atari.py, docs For playing Atari games. It uses convolutional layers and common atari-based pre-processing techniques.

rainbow_atari.py

The rainbow_atari.py has the following features:

  • For playing Atari games. It uses convolutional layers and common atari-based pre-processing techniques.
  • Works with the Atari's pixel Box observation space of shape (210, 160, 3)
  • Works with the Discrete action space

Usage

poetry install -E atari
python cleanrl/rainbow_atari.py --env-id BreakoutNoFrameskip-v4
python cleanrl/rainbow_atari.py --env-id PongNoFrameskip-v4
poetry install -E atari
poetry run python cleanrl/rainbow_atari.py --env-id BreakoutNoFrameskip-v4
poetry run python cleanrl/rainbow_atari.py --env-id PongNoFrameskip-v4
pip install -r requirements/requirements-atari.txt
python cleanrl/rainbow_atari.py --env-id BreakoutNoFrameskip-v4
python cleanrl/rainbow_atari.py --env-id PongNoFrameskip-v4

Explanation of the logged metrics

Running python cleanrl/rainbow_atari.py will automatically record various metrics such as actor or value losses in Tensorboard. Below is the documentation for these metrics:

  • charts/episodic_return: episodic return of the game
  • charts/SPS: number of steps per second
  • losses/td_loss: the n-step distributional TD loss
  • losses/q_values: the mean Q values of the sampled data in the replay buffer
  • charts/beta: the beta value of the prioritized experience replay

Implementation details

rainbow_atari.py is based on (Hessel et al., 2018)1, and uses the same hyperparameters as (Hessel et al., 2018)1. See Table 1 in (Hessel et al., 2018)1 for the hyperparameters. However, there are a few implementation differences:

  1. rainbow_atari.py uses the more popular Adam Optimizer with the --learning-rate=0.0000625 as follows:
    optim.Adam(q_network.parameters(), lr=0.0000625)
    
    whereas (Hessel et al., 2018)1 uses the RMSProp optimizer with --learning-rate=0.0000625, gradient momentum 0.95, squared gradient momentum 0.95, and min squared gradient 0.01 as follows:
    optim.RMSprop(
        q_network.parameters(),
        lr=2.5e-4,
        momentum=0.95,
        # ... PyTorch's RMSprop does not directly support
        # squared gradient momentum and min squared gradient
        # so we are not sure what to put here.
    )
    

Experiment results

To run benchmark experiments, see benchmark/rainbow.sh. Specifically, execute the following command:

benchmark/rainbow.sh
1
2
3
4
5
6
poetry install -E atari
OMP_NUM_THREADS=1 xvfb-run -a poetry run python -m cleanrl_utils.benchmark \
    --env-ids PongNoFrameskip-v4 BeamRiderNoFrameskip-v4 BreakoutNoFrameskip-v4 \
    --command "poetry run python cleanrl/rainbow_atari.py --track --capture_video" \
    --num-seeds 3 \
    --workers 1

Below are the average episodic returns for rainbow_atari.py.

Rainbow C51 DQN
AlienNoFrameskip-v4 2907.03 ± 355.53 1831.00 ± 98.23 1275.77 ± 65.41
AssaultNoFrameskip-v4 7661.11 ± 226.51 3322.54 ± 94.46 3845.70 ± 443.31
GopherNoFrameskip-v4 8111.07 ± 300.60 8715.60 ± 492.23 10415.53 ± 3438.12
YarsRevengeNoFrameskip-v4 63536.39 ± 5432.22 11010.99 ± 904.27 15290.12 ± 8010.56
SpaceInvadersNoFrameskip-v4 1835.52 ± 205.10 2009.05 ± 226.96 1441.68 ± 23.92
MsPacmanNoFrameskip-v4 3113.30 ± 393.00 2445.13 ± 30.16 2109.43 ± 49.85

Learning curves:

Rainbow shows better performance than C51 and DQN.

Rainbow is also more sample efficient than C51 and DQN.

Rainbow obtains better aggregated performance than C51 and DQN.


  1. Hessel, M., Modayil, J., Hasselt, H.V., Schaul, T., Ostrovski, G., Dabney, W., Horgan, D., Piot, B., Azar, M.G., & Silver, D. (2018). Rainbow: Combining Improvements in Deep Reinforcement Learning. AAAI.