MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Strategizing against online learners in normal form repeated games

Author(s)
Assos, Angelos
Thumbnail
DownloadThesis PDF (1.164Mb)
Advisor
Daskalakis, Constantinos
Terms of use
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Copyright retained by author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/
Metadata
Show full item record
Abstract
With the advent of machine learning and AI, learning algorithms are becoming more and more prevalent in online learning settings, where sequential decision-making is required. In such settings, the decisions of each agent can affect the utilities (or losses) of the other agents, as well as influence the decisions made by other agents later on in the interaction. Therefore, if an agent is good at anticipating the behavior of the other agents, in particular how they will make decisions in each round as a function of their experience thus far, he could try to judiciously make his own decisions over the rounds of the interaction so as to influence the other agents to behave in a way that ultimately benefits his own utility. In this thesis, we study repeated two-player games involving two agents: a learner, which employs an online learning algorithm to choose his strategy in each round; and an optimizer, which knows the learner’s utility function, parameters and the learner’s online learning algorithm. The optimizer wants to plan ahead to maximize his own utility while taking into account the learner’s behavior. We study this setting in zero-sum and general-sum games. In zero-sum games, we provide algorithms for the optimizer that can efficiently exploit a learner that employs a specific online learning algorithm in discrete and continuous-time dynamics. Specifically, the learner employs the Multiplicative Weights Update (MWU) algorithm for the discrete-time games, and the Replicator Dynamics in the continuous-time games. In general-sum games, we provide a negative result. Our negative result shows that, unless P=NP, there is no Fully Polynomial Time Approximation Scheme (FPTAS) for maximizing the utility of an optimizer against a learner that best responds to the history in each round. We additionally provide exponential-time algorithms that efficiently strategize against a learner that uses MWU, as well as a new way of thinking about strategizing against online learners via calculus of variations.
Date issued
2025-02
URI
https://hdl.handle.net/1721.1/159121
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.