Show simple item record

dc.contributor.authorThibault, Camille
dc.contributor.authorTian, Jacob-Junqi
dc.contributor.authorP?loquin-Skulski, Gabrielle
dc.contributor.authorCurtis, Taylor
dc.contributor.authorZhou, James
dc.contributor.authorLaflamme, Florence
dc.contributor.authorGuan, Luke Yuxiang
dc.contributor.authorRabbany, Reihaneh
dc.contributor.authorGodbout, Jean-Fran?ois
dc.contributor.authorPelrine, Kellin
dc.date.accessioned2025-09-10T18:23:14Z
dc.date.available2025-09-10T18:23:14Z
dc.date.issued2025-08-03
dc.identifier.isbn979-8-4007-1454-2
dc.identifier.urihttps://hdl.handle.net/1721.1/162635
dc.descriptionKDD ’25, Toronto, ON, Canadaen_US
dc.description.abstractMisinformation is a complex societal issue, and mitigating solutions are difficult to create due to data deficiencies. To address this, we have curated the largest collection of (mis)information datasets in the literature, totaling 75. From these, we evaluated the quality of 36 datasets that consist of statements or claims, as well as the 9 datasets that consist of data in purely paragraph form. We assess these datasets to identify those with solid foundations for empirical work and those with flaws that could result in misleading and non-generalizable results, such as spurious correlations, or examples that are ambiguous or otherwise impossible to assess for veracity. We find the latter issue is particularly severe and affects most datasets in the literature. We further provide state-of-the-art baselines on all these datasets, but show that regardless of label quality, categorical labels may no longer give an accurate evaluation of detection model performance. Finally, we propose and highlight Evaluation Quality Assurance (EQA) as a tool to guide the field toward systemic solutions rather than inadvertently propagating issues in evaluation. Overall, this guide aims to provide a roadmap for higher quality data and better grounded evaluations, ultimately improving research in misinformation detection. All datasets and other artifacts are available at misinfo-datasets.complexdatalab.com. The extended paper, including the appendices, can be accessed via arXiv at arxiv.org/abs/2411.05060.en_US
dc.publisherACM|Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2en_US
dc.relation.isversionofhttps://doi.org/10.1145/3711896.3737437en_US
dc.rightsArticle is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.en_US
dc.sourceAssociation for Computing Machineryen_US
dc.titleA Guide to Misinformation Detection Data and Evaluationen_US
dc.typeArticleen_US
dc.identifier.citationCamille Thibault, Jacob-Junqi Tian, Gabrielle Péloquin-Skulski, Taylor Lynn Curtis, James Zhou, Florence Laflamme, Luke Yuxiang Guan, Reihaneh Rabbany, Jean-François Godbout, and Kellin Pelrine. 2025. A Guide to Misinformation Detection Data and Evaluation. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2 (KDD '25). Association for Computing Machinery, New York, NY, USA, 5801–5809.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Political Scienceen_US
dc.identifier.mitlicensePUBLISHER_POLICY
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dc.date.updated2025-09-01T07:52:14Z
dc.language.rfc3066en
dc.rights.holderThe author(s)
dspace.date.submission2025-09-01T07:52:15Z
mit.licensePUBLISHER_POLICY
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record