Annotating metabolite mass spectra with domain-inspired chemical formula transformers
Author(s)
Goldman, Samuel; Wohlwend, Jeremy; Stražar, Martin; Haroush, Guy; Xavier, Ramnik J; Coley, Connor W; ... Show more Show less
DownloadAccepted version (3.072Mb)
Publisher Policy
Publisher Policy
Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.
Terms of use
Metadata
Show full item recordAbstract
Metabolomics studies have identified small molecules that mediate cell signaling, competition and disease pathology, in part due to large-scale community efforts to measure tandem mass spectra for thousands of metabolite standards. Nevertheless, the majority of spectra observed in clinical samples cannot be unambiguously matched to known structures. Deep learning approaches to small-molecule structure elucidation have surprisingly failed to rival classical statistical methods, which we hypothesize is due to the lack of in-domain knowledge incorporated into current neural network architectures. Here we introduce a neural network-driven workflow for untargeted metabolomics, Metabolite Inference with Spectrum Transformers (MIST), to annotate tandem mass spectra peaks with chemical structures. Unlike existing approaches, MIST incorporates domain insights into its architecture by encoding peaks with their chemical formula representations, implicitly featurizing pairwise neutral losses and training the network to additionally predict substructure fragments. MIST performs favorably compared with both standard neural architectures and the state-of-the-art kernel method on the task of fingerprint prediction for over 70% of metabolite standards and retrieves 66% of metabolites with equal or improved accuracy, with 29% strictly better. We further demonstrate the utility of MIST by suggesting potential dipeptide and alkaloid structures for differentially abundant spectra found in an inflammatory bowel disease patient cohort.
Date issued
2023-08-17Department
Massachusetts Institute of Technology. Computational and Systems Biology Program; Massachusetts Institute of Technology. Department of Chemical Engineering; Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science; Abdul Latif Jameel Clinic for Machine Learning in Health; Broad Institute of MIT and HarvardJournal
Nature Machine Intelligence
Publisher
Springer Science and Business Media LLC
Citation
Goldman, S., Wohlwend, J., Stražar, M. et al. Annotating metabolite mass spectra with domain-inspired chemical formula transformers. Nat Mach Intell 5, 965–979 (2023).
Version: Author's final manuscript