MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Learning from Weak Supervision: Theory, Methods, and Applications

Author(s)
Lang, Hunter
Thumbnail
DownloadThesis PDF (6.009Mb)
Advisor
Sontag, David A.
Terms of use
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Copyright retained by author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/
Metadata
Show full item record
Abstract
The growing demand for high-quality labeled data to train machine learning models has driven widespread adoption of weak supervision and synthetic data methods, which use automated models instead of humans for annotation. Large language models (LLMs) have further accelerated this trend because their zero- and few-shot classification performance enables them to serve as effective “synthetic annotators” for various tasks. In practice, the data generated by these weak annotators is imperfect, but it enables the training of strong models. However, theoretical understanding of why training one model on the outputs of another leads to strong performance remains limited, especially when the annotator model exhibits suboptimal performance on the target task. In this thesis, I develop a theoretical framework for learning from weak supervision that captures the key aspects of the problem better than existing approaches in the crowdsourcing and learning-with-noisy-label literature. This framework establishes structural conditions that explain when and why weak supervision can reliably train strong models. Building on these theoretical results, the second part of the thesis introduces methods to improve how models learn from weak supervision and applies these methods to low-labeled-data settings.
Date issued
2025-05
URI
https://hdl.handle.net/1721.1/164037
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Doctoral Theses
Browse
All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.