| dc.contributor.advisor | Jaakkola, Tommi S. | |
| dc.contributor.advisor | Barzilay, Regina | |
| dc.contributor.author | Yim, Jason | |
| dc.date.accessioned | 2025-12-03T16:10:41Z | |
| dc.date.available | 2025-12-03T16:10:41Z | |
| dc.date.issued | 2025-05 | |
| dc.date.submitted | 2025-08-14T19:46:38.712Z | |
| dc.identifier.uri | https://hdl.handle.net/1721.1/164143 | |
| dc.description.abstract | De novo protein design aims to generate proteins with desired functions by rationally engineering novel protein structures and sequences. The structure requires modeling continuous 3D coordinates of atoms with rigid biochemical constraints of the polymer chain while the sequence is a series of discrete amino acids that should fold into a plausible structure. Understanding the protein function-structure-sequence relationship necessary for protein design is complex, but deep learning has proven promising to learn the relationship from large protein datasets. This thesis aims to develop deep learning models that generate novel structures and sequences that can be guided towards desired functions. We first describe novel generative models that learn to generate protein structures and sequences by developing diffusion models over general state spaces including Riemannian manifolds and discrete tokens. The resulting methods – FrameDiff, FrameFlow, and MultiFlow – demonstrate the ability of diffusion models to extrapolate beyond the training data to generate novel and diverse protein structures and sequences that pass in silico protein design filters. Next, we apply diffusion models to practical protein design challenges by collaborating with experimental and computational biologists to develop RoseTTAFold Diffusion (RFdiffusion). By combining the structure prediction capabilities of RoseTTAFold and diffusion modeling principles, RFdiffusion can generate functional proteins with in vitro validated properties such as high-affinity binders and symmetric protein assemblies. De novo protein design aims to generate proteins with desired functions by rationally engineering novel protein structures and sequences. The structure requires modeling continuous 3D coordinates of atoms with rigid biochemical constraints of the polymer chain while the sequence is a series of discrete amino acids that should fold into a plausible structure. Understanding the protein function-structure-sequence relationship necessary for protein design is complex, but deep learning has proven promising to learn the relationship from large protein datasets. This thesis aims to develop deep learning models that generate novel structures and sequences that can be guided towards desired functions. We first describe novel generative models that learn to generate protein structures and sequences by developing diffusion models over general state spaces including Riemannian manifolds and discrete tokens. The resulting methods – FrameDiff, FrameFlow, and MultiFlow – demonstrate the ability of diffusion models to extrapolate beyond the training data to generate novel and diverse protein structures and sequences that pass in silico protein design filters. Next, we apply diffusion models to practical protein design challenges by collaborating with experimental and computational biologists to develop RoseTTAFold Diffusion (RFdiffusion). By combining the structure prediction capabilities of RoseTTAFold and diffusion modeling principles, RFdiffusion can generate functional proteins with in vitro validated properties such as high-affinity binders and symmetric protein assemblies. | |
| dc.publisher | Massachusetts Institute of Technology | |
| dc.rights | In Copyright - Educational Use Permitted | |
| dc.rights | Copyright retained by author(s) | |
| dc.rights.uri | https://rightsstatements.org/page/InC-EDU/1.0/ | |
| dc.title | Generative Diffusion Models Towards De Novo Protein Design | |
| dc.type | Thesis | |
| dc.description.degree | Ph.D. | |
| dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
| mit.thesis.degree | Doctoral | |
| thesis.degree.name | Doctor of Philosophy | |