Ricerca

04 Mar 22

Safe Policy Iteration: A Monotonically Improving Approximate Policy Iteration Approach

Authors Alberto Maria Metelli, Matteo Pirotta, Daniele Calandriello, Marcello Restelli Abstract This paper presents a study of the policy improvement step that can be usefully exploited by approximate policy–iteration algorithms. When either the policy evaluation step or the policy improvement step returns an approximated result, the sequence of policies produced by policy iteration may not […]

04 Mar 22

Subgaussian Importance Sampling for Off-Policy Evaluation and Learning

Authors Alberto Maria Metelli, Alessio Russo, Marcello Restelli Abstract Importance Sampling (IS) is a widely used building block for a large variety of off-policy estimation and learning algorithms. However, empirical and theoretical studies have progressively shown that vanilla IS leads to poor estimations whenever the behavioral and target policies are too dissimilar. In this paper, […]

04 Mar 22

Online Planning for F1 Race Strategy Identification

Authors Diego Piccinotti, Amarildo Likmeta, Nicolò Brunello, Marcello Restelli Abstract Formula 1 (F1) racing is one of the most competitive racing competitions involving high-performance single-seater racing vehicles. The result of a race is determined by vehicle and driver performance, as well as by the tire and pit-stop strategy employed in the race. In this work, […]

03 Mar 22

Gaussian Approximation for Bias Reduction in Q-Learning

Authors Carlo D’Eramo, Andrea Cini, Alessandro Nuara, Matteo Pirotta, Cesare Alippi, Jan Peters, Marcello Restelli Abstract Temporal-Difference off-policy algorithms are among the building blocks of reinforcement learning (RL). Within this family, Q-Learning is arguably the most famous one, which has been widely studied and extended. The update rule of Q-learning involves the use of the […]

03 Mar 22

Inferring Functional Properties from Fluid Dynamics Features

Author Andrea Schillaci, Maurizio Quadrio, Carlotta Pipolo, Marcello Restelli, Giacomo Boracchi Abstract In a wide range of applied problems involving fluid flows, Computational Fluid Dynamics (CFD) provides detailed quantitative information on the flow field, at variable level of fidelity and computational cost. However, CFD alone cannot predict high-level functional properties that are not easily obtained […]

03 Mar 22

A Policy Gradient Method for Task-Agnostic Exploration

Author Mirco Mutti, Lorenzo Pratissoli, Marcello Restelli Abstract In a reward-free environment, what is a suitable intrinsic objective for an agent to pursue so that it can learn an optimal task-agnostic exploration policy? In this paper, we argue that the entropy of the state distribution induced by limited-horizon trajectories is a sensible target. Especially, we […]

03 Mar 22

Learning to Explore a Class of Multiple Reward-Free Environments

Authors Mirco Mutti, Mattia Mancassola, Marcello Restelli Abstract Several recent works have been dedicated to the pure exploration of a single reward-free environment. Along this line, we address the problem of learning to explore a class of multiple reward-free environments with a unique general strategy, which aims to provide a universal initialization to subsequent reinforcement […]

03 Mar 22

Newton Optimization on Helmholtz Decomposition for Continuous Games

Authors Giorgia Ramponi, Marcello Restelli Abstract Many learning problems involve multiple agents that optimize different interactive functions. In these problems, standard policy gradient algorithms fail due to the nonstationarity of the setting and the different interests of each agent. In fact, the learning algorithms must consider the complex dynamics of these systems to guarantee rapid […]

03 Mar 22

Learning to Explore Multiple Environments without Rewards

03 Mar 22

Meta Learning the Step Size in Policy Gradient Methods

Authors Luca Sabbioni, Francesco Corda, Marcello Restelli Abstract Policy-based algorithms are among the most widely adopted techniques in model-free RL, thanks to their strong theoretical groundings and good properties in continuous action spaces. Unfortunately, these methods require precise and problem-specific hyperparameter tuning to achieve good performance and, as a consequence, they tend to struggle when […]

Safe Policy Iteration: A Monotonically Improving Approximate Policy Iteration Approach

Subgaussian Importance Sampling for Off-Policy Evaluation and Learning

Online Planning for F1 Race Strategy Identification

Gaussian Approximation for Bias Reduction in Q-Learning

Inferring Functional Properties from Fluid Dynamics Features

A Policy Gradient Method for Task-Agnostic Exploration

Learning to Explore a Class of Multiple Reward-Free Environments

Newton Optimization on Helmholtz Decomposition for Continuous Games

Learning to Explore Multiple Environments without Rewards

Meta Learning the Step Size in Policy Gradient Methods

I3Lung: cure mediche personalizzate basate sull’intelligenza artificiale

Machine Learning Models Life Cycle

Configurable Environments in Reinforcement Learning: An Overview

Bayesian Persuasion in Online Settings

Multi-Receiver Online Bayesian Persuasion

Connecting Optimal Ex-Ante Collusion in Teams to Extensive-Form Correlation: Faster Algorithms and Positive Complexity Results

Bayesian Agency: Linear versus Tractable Contracts

Election Manipulation on Social Networks: Seeding, Edge Removal, Edge Addition