Show simple item record

dc.contributor.authorGoldman, Samuel
dc.contributor.authorWohlwend, Jeremy
dc.contributor.authorStražar, Martin
dc.contributor.authorHaroush, Guy
dc.contributor.authorXavier, Ramnik J
dc.contributor.authorColey, Connor W
dc.date.accessioned2026-04-14T18:27:28Z
dc.date.available2026-04-14T18:27:28Z
dc.date.issued2023-08-17
dc.identifier.urihttps://hdl.handle.net/1721.1/165431
dc.description.abstractMetabolomics studies have identified small molecules that mediate cell signaling, competition and disease pathology, in part due to large-scale community efforts to measure tandem mass spectra for thousands of metabolite standards. Nevertheless, the majority of spectra observed in clinical samples cannot be unambiguously matched to known structures. Deep learning approaches to small-molecule structure elucidation have surprisingly failed to rival classical statistical methods, which we hypothesize is due to the lack of in-domain knowledge incorporated into current neural network architectures. Here we introduce a neural network-driven workflow for untargeted metabolomics, Metabolite Inference with Spectrum Transformers (MIST), to annotate tandem mass spectra peaks with chemical structures. Unlike existing approaches, MIST incorporates domain insights into its architecture by encoding peaks with their chemical formula representations, implicitly featurizing pairwise neutral losses and training the network to additionally predict substructure fragments. MIST performs favorably compared with both standard neural architectures and the state-of-the-art kernel method on the task of fingerprint prediction for over 70% of metabolite standards and retrieves 66% of metabolites with equal or improved accuracy, with 29% strictly better. We further demonstrate the utility of MIST by suggesting potential dipeptide and alkaloid structures for differentially abundant spectra found in an inflammatory bowel disease patient cohort.en_US
dc.language.isoen
dc.publisherSpringer Science and Business Media LLCen_US
dc.relation.isversionof10.1038/s42256-023-00708-3en_US
dc.rightsArticle is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.en_US
dc.sourceauthoren_US
dc.titleAnnotating metabolite mass spectra with domain-inspired chemical formula transformersen_US
dc.typeArticleen_US
dc.identifier.citationGoldman, S., Wohlwend, J., Stražar, M. et al. Annotating metabolite mass spectra with domain-inspired chemical formula transformers. Nat Mach Intell 5, 965–979 (2023).en_US
dc.contributor.departmentMassachusetts Institute of Technology. Computational and Systems Biology Programen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Chemical Engineeringen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.contributor.departmentAbdul Latif Jameel Clinic for Machine Learning in Healthen_US
dc.contributor.departmentBroad Institute of MIT and Harvarden_US
dc.relation.journalNature Machine Intelligenceen_US
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2026-04-14T18:21:44Z
dspace.orderedauthorsGoldman, S; Wohlwend, J; Stražar, M; Haroush, G; Xavier, RJ; Coley, CWen_US
dspace.date.submission2026-04-14T18:21:47Z
mit.journal.volume5en_US
mit.journal.issue9en_US
mit.licensePUBLISHER_POLICY
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record