Matteo Castiglioni, Alberto Marchesi, Andrea Celli, Nicola Gatti
Bayesian persuasion studies how an informed sender should partially disclose information to influence the behavior of a self-interested receiver. Classical models make the stringent assumption that the sender knows the receiver’s utility. This can be relaxed by considering an online learning framework in which the sender repeatedly faces a receiver of an unknown, adversarially selected type. We study, for the first time, an online Bayesian persuasion setting with multiple receivers. We focus on the case with no externalities and binary actions, as customary in offline models. Our goal is to design no-regret algorithms for the sender with polynomial per-iteration running time. First, we prove a negative result: for any 0 < α ≤ 1, there is no polynomial-time no-α-regret algorithm when the sender’s utility function is supermodular or anonymous. Then, we focus on the setting of submodular sender’s utility functions and we show that, in this case, it is possible to design a polynomial-time no-α-regret algorithm. To do so, we introduce a general online gradient descent framework to handle online learning problems with a finite number of possible loss functions. This requires the existence of an approximate projection oracle. We show that, in our setting, there exists one such projection oracle which can be implemented in polynomial time.