MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Toward sequence-to-structure predictions of chromatin: Generative AI sheds light on genome organization

Author(s)
Schuette, Greg
Thumbnail
DownloadThesis PDF (52.00Mb)
Advisor
Zhang, Bin
Terms of use
In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
The secrets of the genome have captivated scientists for well over a century, though the active role its spatial organization plays in gene regulation, cell determination, and disease formation has become clear only in recent decades. Significant strides have been made toward characterizing and understanding three-dimensional genome organization, but the scale, complexity, and heterogeneity of the genome and nuclear environment complicate investigations into this system. This thesis alleviates these challenges and holds the potential to accelerate genome organization research by presenting several methodological advances. An efficient Hi-C inversion algorithm appears first. This technique extracts pairwise contact potentials from experimental Hi-C data, uncovering mechanistic details obscured by the correlation between Hi-C contact probabilities. This required the development of a spin-glass model of chromatin and the derivation of a corresponding model inversion; the model may find use in further theoretical studies of chromatin, while the inversion can be applied more broadly. The inversion successfully revealed the location of chromatin loop anchors, supported the phase separation formation of chromatin compartments, and parameterized polymer models that reproduced the experimental Hi-C data with reasonable accuracy. The focus then shifts toward ChromoGen, a generative AI model that predicts three-dimensional chromatin structures directly from DNA sequence and chromatin accessibility data. ChromoGen provided biologically accurate structural ensembles throughout the genome of two cell types, including one omitted from its training data. This transferability suggests that ChromoGen can provide access to the organization of chromatin in a wide variety of cell types while only relying on widely available sequencing data. Afterward, we discuss several strategies to extend ChromoGen to full-chromosome structure prediction tasks. Preliminary results suggest that the technology of today can provide this capability, as we have generated physical chromosome conformations for mouse chromosomes, although sequencing data did not guide this generative process. Correspondingly, we explore the possibility of incorporating a multimodal model with ChromoGen, allowing it to condition structure generation on a wide variety of data types. Success in this area could enable true de novo structure prediction, greatly simplifying research aiming to understand the relationship between sequence, structure, and cellular function while also accelerating the development of treatments for diseases that implicate chromatin dysregulation.
Date issued
2025-05
URI
https://hdl.handle.net/1721.1/162304
Department
Massachusetts Institute of Technology. Department of Chemistry
Publisher
Massachusetts Institute of Technology

Collections
  • Doctoral Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.