Online Learning in Non-Cooperative Configurable Markov Decision Process

Authors:

Giorgia Ramponi, Alberto Maria Metelli, Alessandro Concetti, Marcello Restelli

Conference:

AAAI 2021

Abstract:

In the Configurable Markov Decision Processes there are two entities, a Reinforcement Learning agent and a configurator which can modify some parameters of the environment to improve the performance of the agent. What if the configurator does not have the same intentions as the agent? In this paper, we introduce the Non-Cooperative Configurable Markov Decision Process, a framework that allows having two (possibly different) reward functions for the configurator and for the agent. In this setting, we consider an online learning problem, where the configurator has to find the best among a finite set of possible configurations. We propose a learning algorithm to minimize the configurator expected regret, which exploits the structure of the problem. While a naïve application of the UCB algorithm yields a regret that grows indefinitely over time, we show that our approach suffers only bounded regret. Furthermore, we empirically show the performance of our algorithm in simulated domains.

Full paper

Online Learning in Non-Cooperative Configurable Markov Decision Process

Online Learning in Non-Cooperative Configurable Markov Decision Process

Authors:

Conference:

Abstract:

Programming is a woman’s job

Artificial intelligence in everyday life

Federated Learning To Predict Oxygen Needs

Deepfake: typologies and reflections, deep learning and GANs

Artificial Neural Networks to Understand the Functioning of the Mind

AI algorithm for diagnosing Covid-19 and other pathologies

Online Planning for F1 Race Strategy Identification

Subgaussian Importance Sampling for Off-Policy Evaluation and Learning