A Guide to Misinformation Detection Data and Evaluation

Thibault, Camille; Tian, Jacob-Junqi; P?loquin-Skulski, Gabrielle; Curtis, Taylor; Zhou, James; Laflamme, Florence; Guan, Luke Yuxiang; Rabbany, Reihaneh; Godbout, Jean-Fran?ois; Pelrine, Kellin

dc.contributor.author	Thibault, Camille
dc.contributor.author	Tian, Jacob-Junqi
dc.contributor.author	P?loquin-Skulski, Gabrielle
dc.contributor.author	Curtis, Taylor
dc.contributor.author	Zhou, James
dc.contributor.author	Laflamme, Florence
dc.contributor.author	Guan, Luke Yuxiang
dc.contributor.author	Rabbany, Reihaneh
dc.contributor.author	Godbout, Jean-Fran?ois
dc.contributor.author	Pelrine, Kellin
dc.date.accessioned	2025-09-10T18:23:14Z
dc.date.available	2025-09-10T18:23:14Z
dc.date.issued	2025-08-03
dc.identifier.isbn	979-8-4007-1454-2
dc.identifier.uri	https://hdl.handle.net/1721.1/162635
dc.description	KDD ’25, Toronto, ON, Canada	en_US
dc.description.abstract	Misinformation is a complex societal issue, and mitigating solutions are difficult to create due to data deficiencies. To address this, we have curated the largest collection of (mis)information datasets in the literature, totaling 75. From these, we evaluated the quality of 36 datasets that consist of statements or claims, as well as the 9 datasets that consist of data in purely paragraph form. We assess these datasets to identify those with solid foundations for empirical work and those with flaws that could result in misleading and non-generalizable results, such as spurious correlations, or examples that are ambiguous or otherwise impossible to assess for veracity. We find the latter issue is particularly severe and affects most datasets in the literature. We further provide state-of-the-art baselines on all these datasets, but show that regardless of label quality, categorical labels may no longer give an accurate evaluation of detection model performance. Finally, we propose and highlight Evaluation Quality Assurance (EQA) as a tool to guide the field toward systemic solutions rather than inadvertently propagating issues in evaluation. Overall, this guide aims to provide a roadmap for higher quality data and better grounded evaluations, ultimately improving research in misinformation detection. All datasets and other artifacts are available at misinfo-datasets.complexdatalab.com. The extended paper, including the appendices, can be accessed via arXiv at arxiv.org/abs/2411.05060.	en_US
dc.publisher	ACM\|Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2	en_US
dc.relation.isversionof	https://doi.org/10.1145/3711896.3737437	en_US
dc.rights	Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.	en_US
dc.source	Association for Computing Machinery	en_US
dc.title	A Guide to Misinformation Detection Data and Evaluation	en_US
dc.type	Article	en_US
dc.identifier.citation	Camille Thibault, Jacob-Junqi Tian, Gabrielle Péloquin-Skulski, Taylor Lynn Curtis, James Zhou, Florence Laflamme, Luke Yuxiang Guan, Reihaneh Rabbany, Jean-François Godbout, and Kellin Pelrine. 2025. A Guide to Misinformation Detection Data and Evaluation. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2 (KDD '25). Association for Computing Machinery, New York, NY, USA, 5801–5809.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Political Science	en_US
dc.identifier.mitlicense	PUBLISHER_POLICY
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dc.date.updated	2025-09-01T07:52:14Z
dc.language.rfc3066	en
dc.rights.holder	The author(s)
dspace.date.submission	2025-09-01T07:52:15Z
mit.license	PUBLISHER_POLICY
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: 3711896.3737437.pdf
Size:: 1.230Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record