Matteo Castiglioni, Andrea Celli, Alberto Marchesi, Nicola Gatti

Abstract

In Bayesian persuasion, an informed sender has to design a signaling scheme that discloses the right amount of information so as to influence the behavior of a self-interested receiver. This kind of strategic interaction is ubiquitous in realworld economic scenarios. However, the seminal model by Kamenica and Gentzkow makes some stringent assumptions that limit its applicability in practice. One of the most limiting assumptions is, arguably, that the sender is required to know the receiver’s utility function to compute an optimal signaling scheme. We relax this assumption through an online learning framework in which the sender repeatedly faces a receiver whose type is unknown and chosen adversarially at each round from a finite set of possible types. We are interested in no-regret algorithms prescribing a signaling scheme at each round of the repeated interaction with performances close to that of a best-in-hindsight signaling scheme. First, we prove a hardness result on the per-round running time required to achieve no-α-regret for any α < 1. Then, we provide algorithms for the full and partial feedback models with regret bounds sublinear in the number of rounds and polynomial in the size of the instance.