Papers
219 papers found
Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents
Wenkai Yang, Xiaohan Bi, Yankai Lin et al.
Benchmarking LLMs and LLM-based Agents in Practical Vulnerability Detection for Code Repositories
Alperen Yildiz, Sin G Teo, Yiling Lou et al.
TReMu: Towards Neuro-Symbolic Temporal Reasoning for LLM-Agents with Memory in Multi-Session Dialogues
Yubin Ge, Salvatore Romeo, Jason Cai et al.
Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents
Yuxi Wei, Zi Wang, Yifan Lu et al.
FLAIRR-TS - Forecasting LLM-Agents with Iterative Refinement and Retrieval for Time Series
Gunjan Jalori, Preetika Verma, Sercan O Arik
Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification
Boyang Zhang, Yicong Tan, Yun Shen et al.
Towards Effective Offensive Security LLM Agents: Hyperparameter Tuning, LLM as a Judge, and a Lightweight CTF Benchmark
Minghao Shao, Nanda Rani, Kimberly Milner et al.
Can Graph Learning Improve Planning in LLM-based Agents?
Xixi Wu, Yifei Shen, Caihua Shan et al.
Richelieu: Self-Evolving LLM-Based Agents for AI Diplomacy
Zhenyu Guan, Xiangyu Kong, Fangwei Zhong et al.
OPEx: A Component-Wise Analysis of LLM-Centric Agents in Embodied Instruction Following
Haochen Shi, Zhiyuan Sun, Xingdi Yuan et al.
AXIS: Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents
Junting Lu, Zhiyang Zhang, Fangkai Yang et al.
Embracing Imperfection: Simulating Students with Diverse Cognitive Levels Using LLM-based Agents
Tao Wu, Jingyuan Chen, Wang Lin et al.
Can a Large Language Model Keep My Secrets? A Study on LLM-Controlled Agents
Niklas Hemken, Sai Koneru, Florian Jacob et al.
A Survey of LLM-based Agents in Medicine: How far are we from Baymax?
Wenxuan Wang, Zizhan Ma, Zheng Wang et al.
MemBench: Towards More Comprehensive Evaluation on the Memory of LLM-based Agents
Haoran Tan, Zeyu Zhang, Chen Ma et al.
ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems
Xiangyuan Xue, Zeyu Lu, Di Huang et al.
An Evaluation Mechanism of LLM-based Agents on Manipulating APIs
Bing Liu, Zhou Jianxiang, Dan Meng et al.
TrustAgent: Towards Safe and Trustworthy LLM-based Agents
Wenyue Hua, Xianjun Yang, Mingyu Jin et al.
FlowBench: Revisiting and Benchmarking Workflow-Guided Planning for LLM-based Agents
Ruixuan Xiao, Wentao Ma, Ke Wang et al.
Beyond Demographics: Aligning Role-playing LLM-based Agents Using Human Belief Networks
Yun-Shiuan Chuang, Krirk Nirunwiroj, Zach Studdiford et al.
SPARK: Simulating the Co-evolution of Stance and Topic Dynamics in Online Discourse with LLM-based Agents
Bowen Zhang, Yi Yang, Fuqiang Niu et al.
TelAgentBench: A Multi-faceted Benchmark for Evaluating LLM-based Agents in Telecommunications
Sunwoo Lee, Daseong Jang, Dhammiko Arya et al.
Agent Trading Arena: A Study on Numerical Understanding in LLM-Based Agents
Tianmi Ma, Jiawei Du, Wenxin Huang et al.
HEAL: Hybrid Enhancement with LLM-based Agents for Text-attributed Hypergraph Self-supervised Representation Learning
Ruochang Li, Xiao Luo, Zhiping Xiao et al.