Show simple item record

dc.contributor.advisorJaakkola, Tommi S.
dc.contributor.advisorBarzilay, Regina
dc.contributor.authorYim, Jason
dc.date.accessioned2025-12-03T16:10:41Z
dc.date.available2025-12-03T16:10:41Z
dc.date.issued2025-05
dc.date.submitted2025-08-14T19:46:38.712Z
dc.identifier.urihttps://hdl.handle.net/1721.1/164143
dc.description.abstractDe novo protein design aims to generate proteins with desired functions by rationally engineering novel protein structures and sequences. The structure requires modeling continuous 3D coordinates of atoms with rigid biochemical constraints of the polymer chain while the sequence is a series of discrete amino acids that should fold into a plausible structure. Understanding the protein function-structure-sequence relationship necessary for protein design is complex, but deep learning has proven promising to learn the relationship from large protein datasets. This thesis aims to develop deep learning models that generate novel structures and sequences that can be guided towards desired functions. We first describe novel generative models that learn to generate protein structures and sequences by developing diffusion models over general state spaces including Riemannian manifolds and discrete tokens. The resulting methods – FrameDiff, FrameFlow, and MultiFlow – demonstrate the ability of diffusion models to extrapolate beyond the training data to generate novel and diverse protein structures and sequences that pass in silico protein design filters. Next, we apply diffusion models to practical protein design challenges by collaborating with experimental and computational biologists to develop RoseTTAFold Diffusion (RFdiffusion). By combining the structure prediction capabilities of RoseTTAFold and diffusion modeling principles, RFdiffusion can generate functional proteins with in vitro validated properties such as high-affinity binders and symmetric protein assemblies. De novo protein design aims to generate proteins with desired functions by rationally engineering novel protein structures and sequences. The structure requires modeling continuous 3D coordinates of atoms with rigid biochemical constraints of the polymer chain while the sequence is a series of discrete amino acids that should fold into a plausible structure. Understanding the protein function-structure-sequence relationship necessary for protein design is complex, but deep learning has proven promising to learn the relationship from large protein datasets. This thesis aims to develop deep learning models that generate novel structures and sequences that can be guided towards desired functions. We first describe novel generative models that learn to generate protein structures and sequences by developing diffusion models over general state spaces including Riemannian manifolds and discrete tokens. The resulting methods – FrameDiff, FrameFlow, and MultiFlow – demonstrate the ability of diffusion models to extrapolate beyond the training data to generate novel and diverse protein structures and sequences that pass in silico protein design filters. Next, we apply diffusion models to practical protein design challenges by collaborating with experimental and computational biologists to develop RoseTTAFold Diffusion (RFdiffusion). By combining the structure prediction capabilities of RoseTTAFold and diffusion modeling principles, RFdiffusion can generate functional proteins with in vitro validated properties such as high-affinity binders and symmetric protein assemblies.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://rightsstatements.org/page/InC-EDU/1.0/
dc.titleGenerative Diffusion Models Towards De Novo Protein Design
dc.typeThesis
dc.description.degreePh.D.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeDoctoral
thesis.degree.nameDoctor of Philosophy


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record