Ricerca

openreview_logo

Policy Optimization via Optimal Policy Evaluation

Authors Alberto Maria Metelli, Samuele Meta, Marcello Restelli Abstract Off-policy methods are the basis of a large number of effective Policy Optimization (PO) algorithms. In this setting, Importance Sampling (IS) is typically employed as a what-if analysis tool, with the goal of estimating the performance of a target policy, given samples collected with a different […]
Read More
PMLR

Time-variant variational transfer for value functions

Authors Giuseppe Canonaco, Andrea Soprani, Matteo Giuliani, Andrea Castelletti, Manuel Roveri, Marcello Restelli Abstract In most of the transfer learning approaches to reinforcement learning (RL) the distribution over the tasks is assumed to be stationary. Therefore, the target and source tasks are i.i.d. samples of the same distribution. Unfortunately, this assumption rarely holds in real-world […]
Read More
NeurIPS Proceedings

Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection

Authors Matteo Papini, Andrea Tirinzoni, Aldo Pacchiano, Marcello Restelli, Alessandro Lazaric, Matteo Pirotta Abstract We study the role of the representation of state-action value functions in regret minimization in finite-horizon Markov Decision Processes (MDPs) with linear structure. We first derive a necessary condition on the representation, called universally spanning optimal features (UNISOFT), to achieve constant […]
Read More
NeurIPS Proceedings

Learning in Non-Cooperative Configurable Markov Decision Processes

Authors Giorgia Ramponi, Alberto Maria Metelli, Alessandro Concetti, Marcello Restelli Abstract The Configurable Markov Decision Process framework includes two entities: a Reinforcement Learning agent and a configurator that can modify some environmental parameters to improve the agent’s performance. This presupposes that the two actors have the same reward functions. What if the configurator does not […]
Read More
NeurIPS Proceedings

Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning

Authors Alberto Maria Metelli, Alessio Russo, Marcello Restelli Abstract Importance Sampling (IS) is a widely used building block for a large variety of off-policy estimation and learning algorithms. However, empirical and theoretical studies have progressively shown that vanilla IS leads to poor estimations whenever the behavioral and target policies are too dissimilar. In this paper, […]
Read More
Journal of Energy Storage

A voltage dynamic-based state of charge estimation method for batteries storage systems

Authors Marco Mussi, Luigi Pellegrino, Marcello Restelli, Francesco Trovò. Abstract In recent years, the use of Lithium-ion batteries in smart power systems and hybrid/electric vehicles has become increasingly popular since they provide a flexible and cost-effective way to store and deliver power. Their full integration into more complex systems requires an accurate estimate of the […]
Read More
AGU_FallMeeting21

Advancing drought monitoring via feature extraction

Authors Alberto Metelli, Andrea Castelletti, Marcello Restelli. Abstract A drought is a slowly developing natural phenomenon that can occur in all climatic zones and can be defined as a temporary but significant decrease in water availability. Over the past three decades, the cost of droughts in Europe has amounted to over 100 billion euros and […]
Read More
AAAI-Logo

Policy Optimization as Online Learning with Mediator Feedback

Policy Optimization as Online Learning with Mediator Feedback Authors: Alberto Maria Metelli, Matteo Papini, Pierluca D’Oro, Marcello Restelli Conference: AAAI 2021 Abstract: Policy Optimization (PO) is a widely used approach to address continuous control tasks. In this paper, we introduce the notion of mediator feedback that frames PO as an online learning problem over the […]
Read More

Task-Agnostic Exploration via Policy Gradient of a Non-Parametric State Entropy Estimate

Task-Agnostic Exploration via Policy Gradient of a Non-Parametric State Entropy Estimate Authors: Mirco Mutti, Lorenzo Pratissoli, Marcello Restelli Conference: AAAI 2021 Abstract: In a reward-free environment, what is a suitable intrinsic objective for an agent to pursue so that it can learn an optimal task-agnostic exploration policy? In this paper, we argue that the entropy […]
Read More
AAAI-Logo

Learning Probably Approximately Correct Maximin Strategies in Games with Infinite Strategy Spaces

Learning Probably Approximately Correct Maximin Strategies in Games with Infinite Strategy Spaces Authors: Alberto Marchesi, Francesco Trovò, Nicola Gatti Conference: AAAI 2021 Abstract: We tackle the problem of learning equilibria in simulationbased games. In such games, the players’ utility functions cannot be described analytically, as they are given through a black-box simulator that can be […]
Read More