# How to Make Deep RL Work in Practice

@article{Rao2020HowTM, title={How to Make Deep RL Work in Practice}, author={Nirnai Rao and Elie Aljalbout and Axel Sauer and Sami Haddadin}, journal={ArXiv}, year={2020}, volume={abs/2010.13083} }

In recent years, challenging control problems became solvable with deep reinforcement learning (RL). To be able to use RL for large-scale real-world applications, a certain degree of reliability in their performance is necessary. Reported results of state-of-the-art algorithms are often difficult to reproduce. One reason for this is that certain implementation details influence the performance significantly. Commonly, these details are not highlighted as important techniques to achieve state-of… Expand

#### Figures and Tables from this paper

#### 5 Citations

Learning Vision-based Reactive Policies for Obstacle Avoidance

- Computer Science
- CoRL
- 2020

The ability of the proposed method to efficiently learn stable obstacle avoidance strategies at a high success rate, while maintaining closed-loop responsiveness required for critical applications like human-robot interaction is shown. Expand

A few lessons learned in reinforcement learning for quadcopter attitude control

- Computer Science
- HSCC
- 2021

This paper discusses theoretical as well as practical aspects of training neural nets for controlling a crazyflie 2.0 drone, and describes thoroughly the choices in training algorithms, neural net architecture, hyperparameters, observation space etc. Expand

A learning gap between neuroscience and reinforcement learning

- Computer Science
- ArXiv
- 2021

A T-maze task from neuroscience is extended for use with reinforcement learning algorithms, and it is shown that state-of-the-art algorithms are not capable of solving this problem. Expand

RLOps: Development Life-cycle of Reinforcement Learning Aided Open RAN

- Computer Science
- ArXiv
- 2021

Radio access network (RAN) technologies continue to witness massive growth, with Open RAN gaining the most recent momentum. In the O-RAN specifications, the RAN intelligent controller (RIC) serves as… Expand

Reinforcement Learning with Formal Performance Metrics for Quadcopter Attitude Control under Non-nominal Contexts

- Computer Science
- ArXiv
- 2021

A robust form of a signal temporal logic is developed to quantitatively evaluate the vehicle’s behavior and measure the performance of controllers to draw conclusions on practical controller design by reinforcement learning. Expand

#### References

SHOWING 1-10 OF 33 REFERENCES

Deep Reinforcement Learning that Matters

- Computer Science, Mathematics
- AAAI
- 2018

Challenges posed by reproducibility, proper experimental techniques, and reporting procedures are investigated and guidelines to make future results in deep RL more reproducible are suggested. Expand

Benchmarking Deep Reinforcement Learning for Continuous Control

- Computer Science, Mathematics
- ICML
- 2016

This work presents a benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with very high state and action dimensionality such as 3D humanoid locomotion, task with partial observations, and tasks with hierarchical structure. Expand

Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control

- Computer Science
- ArXiv
- 2017

The significance of hyper-parameters in policy gradients for continuous control, general variance in the algorithms, and reproducibility of reported results are investigated and the guidelines on reporting novel results as comparisons against baseline methods are provided. Expand

Continuous control with deep reinforcement learning

- Computer Science, Mathematics
- ICLR
- 2016

This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs. Expand

Q-Learning for Continuous Actions with Cross-Entropy Guided Policies

- Computer Science
- ArXiv
- 2019

This work proposes a novel approach, called Cross-Entropy Guided Policies, or CGP, that aims to combine the stability and performance of iterative sampling policies with the low computational cost of a policy network. Expand

The Mirage of Action-Dependent Baselines in Reinforcement Learning

- Computer Science, Mathematics
- ICML
- 2018

The variance decomposition of the policy gradient estimator is decompose and it is numerically shown that learned state-action-dependent baselines do not in fact reduce variance over a state-dependent baseline in commonly tested benchmark domains. Expand

Addressing Function Approximation Error in Actor-Critic Methods

- Computer Science, Mathematics
- ICML
- 2018

This paper builds on Double Q-learning, by taking the minimum value between a pair of critics to limit overestimation, and draws the connection between target networks and overestimation bias. Expand

RE-EVALUATE: Reproducibility in Evaluating Reinforcement Learning Algorithms

- Computer Science
- 2018

This work highlights key differences in evaluation in RL compared to supervised learning, and proposes an evaluation pipeline that can be decoupled from the algorithm code, and hopes such an evaluation Pipeline can be standardized, as a step towards robust and reproducible research in RL. Expand

Playing Atari with Deep Reinforcement Learning

- Computer Science
- ArXiv
- 2013

This work presents the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning, which outperforms all previous approaches on six of the games and surpasses a human expert on three of them. Expand

Time Limits in Reinforcement Learning

- Computer Science, Mathematics
- ICML
- 2018

This paper provides a formal account for how time limits could effectively be handled in each of the two cases and explains why not doing so can cause state-aliasing and invalidation of experience replay, leading to suboptimal policies and training instability. Expand