The Evolutionary Dynamics of Soft-Max Policy Gradient in Games

Authors

Martino Bernasconi, Federico Cacciamani, Simone Fioravanti, Nicola Gatti, Francesco Trovò

Abstract

In this paper, we study the mean dynamics of the soft-max policy gradient algorithm in multi-agent settings by resorting to evolutionary game theory and dynamical system tools. Such a study is crucial to understand the algorithm’s weaknesses when employed in multi-agent settings. Unlike most multi-agent reinforcement learning algorithms, whose mean dynamics is a slight variant of the replicator dynamics not affecting the properties of the original dynamics, the softmax policy gradient dynamics presents a structure significantly different from that of the replicator. Indeed the dynamics is equivalent to the replicator dynamics in a different game derived by a non-convex transformation of the payoffs of the original game. First we recover the properties—already known for the discrete-time soft-max policy gradient—for the continuous-time mean dynamics in the case of learning a best response. As it commonly happens, the continuous-time dynamics allow for a simpler analysis and deeper understanding of the algorithm that we use to characterize fully the dynamics and improve on its theoretical understanding. Then, we resort to models based on single- and multi-population games, showing that the dynamics preserve the volume as prove that, in arbitrary instances, it is not possible to obtain last-iterate convergence when the equilibrium of the game is fully mixed. Furthermore, we give empirical evidence that dynamics starting from close initial points may expand over time, thus showing that the behaviour of the dynamics in games with fully-mixed equilibrium is chaotic.

Full paper

The Evolutionary Dynamics of Soft-Max Policy Gradient in Games

Authors

Abstract

I3Lung: cure mediche personalizzate basate sull’intelligenza artificiale

Machine Learning Models Life Cycle

Configurable Environments in Reinforcement Learning: An Overview

Bayesian Persuasion in Online Settings

Multi-Receiver Online Bayesian Persuasion

Connecting Optimal Ex-Ante Collusion in Teams to Extensive-Form Correlation: Faster Algorithms and Positive Complexity Results

Bayesian Agency: Linear versus Tractable Contracts

Election Manipulation on Social Networks: Seeding, Edge Removal, Edge Addition