A Survey on Trustworthy LLM Agents: Threats and Countermeasures

Yu, Miao; Meng, Fanci; Zhou, Xinyun; Wang, Shilong; Mao, Junyuan; Pan, Linsey; Chen, Tianlong; Wang, Kun; Li, Xinfeng; Zhang, Yongfeng; An, Bo; Wen, Qingsong

dc.contributor.author	Yu, Miao
dc.contributor.author	Meng, Fanci
dc.contributor.author	Zhou, Xinyun
dc.contributor.author	Wang, Shilong
dc.contributor.author	Mao, Junyuan
dc.contributor.author	Pan, Linsey
dc.contributor.author	Chen, Tianlong
dc.contributor.author	Wang, Kun
dc.contributor.author	Li, Xinfeng
dc.contributor.author	Zhang, Yongfeng
dc.contributor.author	An, Bo
dc.contributor.author	Wen, Qingsong
dc.date.accessioned	2025-09-02T19:28:29Z
dc.date.available	2025-09-02T19:28:29Z
dc.date.issued	2025-08-03
dc.identifier.isbn	979-8-4007-1454-2
dc.identifier.uri	https://hdl.handle.net/1721.1/162598
dc.description	KDD ’25, Toronto, ON, Canada	en_US
dc.description.abstract	With the rapid evolution of Large Language Models (LLMs), LLM-based agents and Multi-agent Systems (MAS) have significantly expanded the capabilities of LLM ecosystems. This evolution stems from empowering LLMs with additional modules such as memory, tools, environment, and even other agents. However, this advancement has also introduced more complex issues of trustworthiness, which previous research focusing solely on LLMs could not cover. In this survey, we propose the TrustAgent framework, a comprehensive study on the trustworthiness of agents, characterized by modular taxonomy, multi-dimensional connotations, and technical implementation. By thoroughly investigating and summarizing newly emerged attacks, defenses, and evaluation methods for agents and MAS, we extend the concept of Trustworthy LLM to the emerging paradigm of Trustworthy Agent. In TrustAgent, we begin by deconstructing and introducing various components of the Agent and MAS. Then, we categorize their trustworthiness into intrinsic (brain, memory, and tool) and extrinsic (user, agent, and environment) aspects. Subsequently, we delineate the multifaceted meanings of trustworthiness and elaborate on the implementation techniques of existing research related to these internal and external modules. Finally, we present our insights and outlook on this domain, aiming to provide guidance for future endeavors. For easy reference, we categorize all the studies mentioned in this survey according to our taxonomy, available at: https://github.com/Ymm-cll/TrustAgent.	en_US
dc.publisher	ACM\|Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2	en_US
dc.relation.isversionof	https://doi.org/10.1145/3711896.3736561	en_US
dc.rights	Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.	en_US
dc.source	Association for Computing Machinery	en_US
dc.title	A Survey on Trustworthy LLM Agents: Threats and Countermeasures	en_US
dc.type	Article	en_US
dc.identifier.citation	Miao Yu, Fanci Meng, Xinyun Zhou, Shilong Wang, Junyuan Mao, Linsey Pan, Tianlong Chen, Kun Wang, Xinfeng Li, Yongfeng Zhang, Bo An, and Qingsong Wen. 2025. A Survey on Trustworthy LLM Agents: Threats and Countermeasures. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2 (KDD '25). Association for Computing Machinery, New York, NY, USA, 6216–6226.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory	en_US
dc.identifier.mitlicense	PUBLISHER_POLICY
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dc.date.updated	2025-09-01T07:49:54Z
dc.language.rfc3066	en
dc.rights.holder	The author(s)
dspace.date.submission	2025-09-01T07:49:55Z
mit.license	PUBLISHER_POLICY
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: 3711896.3736561.pdf
Size:: 5.285Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record