Year: 2022

Machine-Learning

Dealing with multiple experts and non-stationarity in inverse reinforcement learning: an application to real-life problems

Authors Amarildo Likmeta, Alberto Maria Metelli, Giorgia Ramponi, Andrea Tirinzoni, Matteo Giuliani, Marcello Restelli Abstract In real-world applications, inferring the intentions of expert agents (e.g., human operators) can be fundamental to understand how possibly conflicting objectives are managed, helping to interpret the demonstrated behavior. In this paper, we discuss how inverse reinforcement learning (IRL) can […]
Read More
Nature logo

Quantum compiling by deep reinforcement learning

Authors Lorenzo Moro, Matteo G. A. Paris, Marcello Restelli, Enrico Prati Abstract The general problem of quantum compiling is to approximate any unitary transformation that describes the quantum computation as a sequence of elements selected from a finite base of universal quantum gates. The Solovay-Kitaev theorem guarantees the existence of such an approximating sequence. Though, […]
Read More
PMLR

Provably Efficient Learning of Transferable Rewards

Authors Alberto Maria Metelli, Giorgia Ramponi, Alessandro Concetti, Marcello Restelli  Abstract The reward function is widely accepted as a succinct, robust, and transferable representation of a task. Typical approaches, at the basis of Inverse Reinforcement Learning (IRL), leverage on expert demonstrations to recover a reward function. In this paper, we study the theoretical properties of the class of […]
Read More
PMLR

Leveraging Good Representations in Linear Contextual Bandits

Authors Matteo Papini, Andrea Tirinzoni, Marcello Restelli, Alessandro Lazaric, Matteo Pirotta Abstract The linear contextual bandit literature is mostly focused on the design of efficient learning algorithms for a given representation. However, a contextual bandit problem may admit multiple linear representations, each one with different characteristics that directly impact the regret of the learning algorithm. In particular, recent works […]
Read More
openreview_logo

Meta Learning the Step Size in Policy Gradient Methods

Authors Luca Sabbioni, Francesco Corda, Marcello Restelli Abstract Policy-based algorithms are among the most widely adopted techniques in model-free RL, thanks to their strong theoretical groundings and good properties in continuous action spaces. Unfortunately, these methods require precise and problem-specific hyperparameter tuning to achieve good performance and, as a consequence, they tend to struggle when […]
Read More
openreview_logo

Learning to Explore a Class of Multiple Reward-Free Environments

Authors Mirco Mutti, Mattia Mancassola, Marcello Restelli Abstract Several recent works have been dedicated to the pure exploration of a single reward-free environment. Along this line, we address the problem of learning to explore a class of multiple reward-free environments with a unique general strategy, which aims to provide a universal initialization to subsequent reinforcement […]
Read More
openreview_logo

A Policy Gradient Method for Task-Agnostic Exploration

Author Mirco Mutti, Lorenzo Pratissoli, Marcello Restelli Abstract In a reward-free environment, what is a suitable intrinsic objective for an agent to pursue so that it can learn an optimal task-agnostic exploration policy? In this paper, we argue that the entropy of the state distribution induced by limited-horizon trajectories is a sensible target. Especially, we […]
Read More
Ieeexplore-logo

Inferring Functional Properties from Fluid Dynamics Features

Author Andrea Schillaci, Maurizio Quadrio, Carlotta Pipolo, Marcello Restelli, Giacomo Boracchi Abstract In a wide range of applied problems involving fluid flows, Computational Fluid Dynamics (CFD) provides detailed quantitative information on the flow field, at variable level of fidelity and computational cost. However, CFD alone cannot predict high-level functional properties that are not easily obtained […]
Read More
JMLR

Gaussian Approximation for Bias Reduction in Q-Learning

Authors Carlo D’Eramo, Andrea Cini, Alessandro Nuara, Matteo Pirotta, Cesare Alippi, Jan Peters, Marcello Restelli Abstract Temporal-Difference off-policy algorithms are among the building blocks of reinforcement learning (RL). Within this family, Q-Learning is arguably the most famous one, which has been widely studied and extended. The update rule of Q-learning involves the use of the […]
Read More