MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Generative Diffusion Models Towards De Novo Protein Design

Author(s)
Yim, Jason
Thumbnail
DownloadThesis PDF (37.02Mb)
Advisor
Jaakkola, Tommi S.
Barzilay, Regina
Terms of use
In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
De novo protein design aims to generate proteins with desired functions by rationally engineering novel protein structures and sequences. The structure requires modeling continuous 3D coordinates of atoms with rigid biochemical constraints of the polymer chain while the sequence is a series of discrete amino acids that should fold into a plausible structure. Understanding the protein function-structure-sequence relationship necessary for protein design is complex, but deep learning has proven promising to learn the relationship from large protein datasets. This thesis aims to develop deep learning models that generate novel structures and sequences that can be guided towards desired functions. We first describe novel generative models that learn to generate protein structures and sequences by developing diffusion models over general state spaces including Riemannian manifolds and discrete tokens. The resulting methods – FrameDiff, FrameFlow, and MultiFlow – demonstrate the ability of diffusion models to extrapolate beyond the training data to generate novel and diverse protein structures and sequences that pass in silico protein design filters. Next, we apply diffusion models to practical protein design challenges by collaborating with experimental and computational biologists to develop RoseTTAFold Diffusion (RFdiffusion). By combining the structure prediction capabilities of RoseTTAFold and diffusion modeling principles, RFdiffusion can generate functional proteins with in vitro validated properties such as high-affinity binders and symmetric protein assemblies. De novo protein design aims to generate proteins with desired functions by rationally engineering novel protein structures and sequences. The structure requires modeling continuous 3D coordinates of atoms with rigid biochemical constraints of the polymer chain while the sequence is a series of discrete amino acids that should fold into a plausible structure. Understanding the protein function-structure-sequence relationship necessary for protein design is complex, but deep learning has proven promising to learn the relationship from large protein datasets. This thesis aims to develop deep learning models that generate novel structures and sequences that can be guided towards desired functions. We first describe novel generative models that learn to generate protein structures and sequences by developing diffusion models over general state spaces including Riemannian manifolds and discrete tokens. The resulting methods – FrameDiff, FrameFlow, and MultiFlow – demonstrate the ability of diffusion models to extrapolate beyond the training data to generate novel and diverse protein structures and sequences that pass in silico protein design filters. Next, we apply diffusion models to practical protein design challenges by collaborating with experimental and computational biologists to develop RoseTTAFold Diffusion (RFdiffusion). By combining the structure prediction capabilities of RoseTTAFold and diffusion modeling principles, RFdiffusion can generate functional proteins with in vitro validated properties such as high-affinity binders and symmetric protein assemblies.
Date issued
2025-05
URI
https://hdl.handle.net/1721.1/164143
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Doctoral Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.