A Data Attribution-Based Approach to Model Diagnosis in LC-MS/MS Structure Prediction
Author(s)
Khoo, Ling Min Serena
DownloadThesis PDF (2.441Mb)
Advisor
Barzilay, Regina
Terms of use
Metadata
Show full item recordAbstract
Elucidating the structure of small molecules from complex mixtures using liquid chromatography tandem mass spectrometry (LC-MS/MS) is a challenging task with far-reaching implications in many areas such as drug discovery, environmental science and metabolism research. Yet, despite its importance and significant efforts to develop machine learning (ML) models for the task of elucidating the molecular structures of unknown compounds from LC-MS/MS spectra, the performance of these ML-based models remains limited. As a result, the performance of current ML-based models has been reported as insufficient for practical applications, thereby warranting a deeper investigation into their limitations to advance ML-based molecular structure elucidation from LC-MS/MS and enable their utility in real-world settings. Here, we leverage data attribution methods to systematically identify and validate hypotheses about the sources of generalization challenges that hinder current model performance. Our goal is to automatically uncover insights into the failure modes of existing ML models for LC-MS/MS, thereby laying the foundation for developing more robust and accurate models.
Date issued
2025-09Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology