Building small domain-specific masked language models
vs. large generative models for clinical decision support
and their effects on users.

Sergeeva, Elena

Author(s)

Sergeeva, Elena

DownloadThesis PDF (2.247Mb)

Advisor

Szolovits, Peter

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

The frequently adopted definition of knowledge defines it as “justified true belief”. As one may notice this definition presents some issues when applied to AI: it is unclear to which degree it is justified to use “humanizing” vocabulary like “belief” or “justification” when describing the performance of an AI system. Traditional explicit knowledge-representation based AI involves reasoning over symbolic representation of statements standing for such “justified true beliefs” [1], the modern connectionist methodology however replaces explicit reasoning with making a prediction based on a set of computations done over weighted continuous representations of the inputs. The continuous representations learned by such systems remain “black box-like”, where the only elements directly understandable by the human user are the model inputs and outputs. In the first part of this thesis I introduce a set of Masked-Language model transformer based models for a diverse set of medical natural language processing tasks including Named Entity Recognition, Negation Extraction and Relation extraction that perform as well or better than bigger prompt-and-generate transformer-based causal language models. In the second part of the thesis, I discuss the modern “prompt-and-generate” approach to natural language processing where both the inputs and the outputs of the model are word-like elements commonly referred to as “tokens”. I explore the nature of token based representation of the input and look at the way token “meaning” is refined at each layer of the successive transformer computation. With respect to the outputs, I explore how people engage with AI generated sequences of tokens that people perceive as “explained” predictions.

Date issued

2025-05

URI

https://hdl.handle.net/1721.1/164141

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Doctoral Theses