Learning to Cooperate in Non-cooperative Games


Speaker


Abstract

This study explores the emergence of cooperative behavior in algorithmic decision-making systems powered by reinforcement learning in competitive environments. We examine this phenomenon through the lens of the repeated prisoner's dilemma, where cooperation can arise from initially non-cooperative interactions. Assuming both players adopt actor-critic reinforcement learning algorithms, we offer theoretical guarantees for the convergence towards cooperative equilibria. Our investigation extends to financial markets, focusing on automated market-makers (AMMs). We demonstrate that AMMs using actor-critic algorithms can achieve either cooperative or Nash-Bertrand equilibria, depending on their initial propensity to cooperate. Notably, this cooperative behavior may be interpreted as a form of algorithmic collusion, raising important regulatory questions. These findings reveal how cooperative outcomes can be sustained through endogenously learned policies, providing crucial insights into algorithmic decision-making in financial markets and beyond.

About Xiaowei

Xiaowei Zhang is an Associate Professor in the Department of Industrial Engineering and Decision Analytics at the Hong Kong University of Science and Technology. He earned his Ph.D. in Management Science and Engineering in 2011 and M.S. in Financial Mathematics in 2010, both from Stanford University, and his B.S. in Mathematics in 2006 from Nankai University. His research focuses on methodological advances in stochastic simulation and optimization, decision analytics, and reinforcement learning, with applications in service operations management, financial technology, and digital economy. He currently serves as an Associate Editor for Management Science and Operations Research.

This seminar will take place in person in room T09-67. Alternatively, click here to join the seminar via Zoom.