Meta Learning the Step Size in Policy Gradient Methods

Authors

Luca Sabbioni, Francesco Corda, Marcello Restelli

Abstract

Policy-based algorithms are among the most widely adopted techniques in model-free RL, thanks to their strong theoretical groundings and good properties in continuous action spaces. Unfortunately, these methods require precise and problem-specific hyperparameter tuning to achieve good performance and, as a consequence, they tend to struggle when asked to accomplish a series of heterogeneous tasks. In particular, the selection of the step size has a crucial impact on the ability to learn a highly performing policy, affecting the speed and the stability of the training process, and often being the main culprit for poor results. In this paper, we tackle these issues with a Meta Reinforcement Learning approach, by introducing a new formulation, known as meta-MDP, that can be used to solve any hyperparameter selection problem in RL with contextual processes. After providing a theoretical Lipschitz bound to the performance in different tasks, we adopt the proposed framework to train a batch RL algorithm to dynamically recommend the most adequate step size for different policies and tasks. In conclusion, we present an experimental campaign to show the advantages of selecting an adaptive learning rate in heterogeneous environments.

Full paper

Meta Learning the Step Size in Policy Gradient Methods

Authors

Abstract

I3Lung: cure mediche personalizzate basate sull’intelligenza artificiale

Machine Learning Models Life Cycle

Configurable Environments in Reinforcement Learning: An Overview

Bayesian Persuasion in Online Settings

Multi-Receiver Online Bayesian Persuasion

Connecting Optimal Ex-Ante Collusion in Teams to Extensive-Form Correlation: Faster Algorithms and Positive Complexity Results

Bayesian Agency: Linear versus Tractable Contracts

Election Manipulation on Social Networks: Seeding, Edge Removal, Edge Addition