Driving exploration by maximum distribution in gaussian process bandits

Authors:

Alessandro Nuara, Francesco Trovò, Dominic Crippa, Nicola Gatti, Marcello Restelli

Conference:

AAMAS 2020

Abstract:

The problem of finding optimal solutions of stochastic functions over continuous domains is common in several real-world applications, such as, e.g., advertisement allocation, dynamic pricing, and power control in wireless networks. The optimization process is customarily performed by selecting input points sequentially and receiving a noisy observation from the function. In this paper, we resort to the Multi-Armed Bandit approach, aiming at optimizing stochastic functions when keeping at a pace the regret (i.e., the loss incurred during the learning process) during the learning process. In particular, we focus on smooth stochastic functions, as it is known that any algorithm suffers from a constant per-round regret when the domain is continuous, and the function does not satisfy any kind of regularity. Our main original contribution is the provision of a general family of algorithms, which, under the mild assumption that stochastic functions are a realization of a Gaussian Process, provides a regret of the order of O(√γTT,being γT the maximum information gain and T the time horizon used for the learning process. Furthermore, we design a specific algorithm of our family, called DAGP-UCB, which exploits the structure of GPs to select the next arm to pull more effectively than the previous algorithms available in the state of the art, thus speeding up the learning process. In particular, we show the superior performance of DAGP-UCB in both synthetic and applicative settings, comparing it with the state-of-the-art algorithms.

Full paper

Driving exploration by maximum distribution in gaussian process bandits

Driving exploration by maximum distribution in gaussian process bandits

Authors:

Conference:

Abstract:

I3Lung: cure mediche personalizzate basate sull’intelligenza artificiale

Machine Learning Models Life Cycle

Configurable Environments in Reinforcement Learning: An Overview

Bayesian Persuasion in Online Settings

Multi-Receiver Online Bayesian Persuasion

Connecting Optimal Ex-Ante Collusion in Teams to Extensive-Form Correlation: Faster Algorithms and Positive Complexity Results

Bayesian Agency: Linear versus Tractable Contracts

Election Manipulation on Social Networks: Seeding, Edge Removal, Edge Addition