Learning to Explore a Class of Multiple Reward-Free Environments

Authors

Mirco Mutti, Mattia Mancassola, Marcello Restelli

Abstract

Several recent works have been dedicated to the pure exploration of a single reward-free environment. Along this line, we address the problem of learning to explore a class of multiple reward-free environments with a unique general strategy, which aims to provide a universal initialization to subsequent reinforcement learning problems specified over the same class. Notably, the problem is inherently multi-objective as we can trade off the exploration performance between environments in many ways. In this work, we foster an exploration strategy that is sensitive to the most adverse cases within the class. Hence, we cast the exploration problem as the maximization of the mean of a critical percentile of the state visitation entropy induced by the exploration strategy over the class of environments. Then, we present a policy gradient algorithm, MEMENTO, to optimize the introduced objective through mediated interactions with the class. Finally, we empirically demonstrate the ability of the algorithm in learning to explore challenging classes of continuous environments and we show that reinforcement learning greatly benefits from the pre-trained exploration strategy when compared to learning from scratch.

Full paper

Learning to Explore a Class of Multiple Reward-Free Environments

Authors

Abstract

Programming is a woman’s job

Artificial intelligence in everyday life

Federated Learning To Predict Oxygen Needs

Deepfake: typologies and reflections, deep learning and GANs

Artificial Neural Networks to Understand the Functioning of the Mind

AI algorithm for diagnosing Covid-19 and other pathologies

Online Planning for F1 Race Strategy Identification

Subgaussian Importance Sampling for Off-Policy Evaluation and Learning