CONDOR: Clinical Ontology-aware Networked Data Organization and Retrieval
Author(s)
Dongo Aguirre, Gyalpo Melchisedeck
DownloadThesis PDF (1.797Mb)
Advisor
Madden, Samuel
Terms of use
Metadata
Show full item recordAbstract
Until now, state-of-the-art research into AI-driven clinical workflows has been confined to proprietary, closed-source systems from vendors like Epic and Oracle, or private experiments like Stanford’s ChatEHR, creating a critical barrier to academic innovation. This thesis introduces CONDOR, the first fully open-source and replicable research environment designed to simulate an agentic, conversational AI interacting with a high-fidelity Electronic Health Record (EHR). By integrating an open-source, FHIR-native EHR (Medplum) with a complex, realistic public clinical dataset (MIMIC-IV FHIR), CONDOR provides a foundational testbed that has been previously unavailable to the research community. The framework’s primary contribution is a novel alignment and evaluation methodology that adapts the principles of SelfCite to the clinical domain. We propose a ‘ClinicalConfidence‘ score to quantify the trustworthiness of generated statements and programmatically generate a high-quality preference dataset for alignment using Simple Preference Optimization (SimPO). We compare a standard vector-based Retrieval-Augmented Generation (RAG) baseline against a more advanced GraphRAG architecture that leverages a two-tiered knowledge graph of patient data and medical ontologies. Our results demonstrate that the full CONDOR system, combining GraphRAG with SimPO alignment, significantly improves citation quality and verifiability, establishing a new open-source benchmark for the development of safe and reliable clinical AI.
Date issued
2025-09Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology