Papers
Agents Under Siege: Breaking Pragmatic Multi-Agent LLM Systems with Optimized Prompt Attacks
Rana Shahroz, Zhen Tan, Sukwon Yun et al.
On the Risk of Evidence Pollution for Malicious Social Text Detection in the Era of LLMs
Herun Wan, Minnan Luo, Zhixiong Su et al.
Polishing Every Facet of the GEM: Testing Linguistic Competence of LLMs and Humans in Korean
SungHo Kim, Nayeon Kim, Taehee Jeon et al.
NexusSum: Hierarchical LLM Agents for Long-Form Narrative Summarization
Hyuntak Kim, Byung-Hak Kim
Exploring the Potential of LLMs as Personalized Assistants: Dataset, Evaluation, and Analysis
Jisoo Mok, Ik-hwan Kim, Sangkwon Park et al.
Towards Context-Robust LLMs: A Gated Representation Fine-tuning Approach
Shenglai Zeng, Pengfei He, Kai Guo et al.
WebWalker: Benchmarking LLMs in Web Traversal
Jialong Wu, Wenbiao Yin, Yong Jiang et al.
AutoGUI: Scaling GUI Grounding with Automatic Functionality Annotations from LLMs
Hongxin Li, Jingfan Chen, Jingran Su et al.
Praetor: A Fine-Grained Generative LLM Evaluator with Instance-Level Customizable Evaluation Criteria
Yongqi Leng, Renren Jin, Yue Chen et al.
ExpeTrans: LLMs Are Experiential Transfer Learners
Jinglong Gao, Xiao Ding, Lingxiao Zou et al.
Top-n𝜎: Eliminating Noise in Logit Space for Robust Token Sampling of LLM
Chenxia Tang, Jianchun Liu, Hongli Xu et al.
MoQAE: Mixed-Precision Quantization for Long-Context LLM Inference via Mixture of Quantization-Aware Experts
Wei Tao, Haocheng Lu, Xiaoyang Qu et al.
GuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and Reasoning
Qingchen Yu, Zifan Zheng, Ding Chen et al.
Beware of Your Po! Measuring and Mitigating AI Safety Risks in Role-Play Fine-Tuning of LLMs
Weixiang Zhao, Yulin Hu, Yang Deng et al.
Dynamic Parallel Tree Search for Efficient LLM Reasoning
Yifu Ding, Wentao Jiang, Shunyu Liu et al.
Pre3: Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation
Junyi Chen, Shihao Bai, Zaijun Wang et al.
GuideBench: Benchmarking Domain-Oriented Guideline Following for LLM Agents
Lingxiao Diao, Xinyue Xu, Wanxuan Sun et al.
TC–RAG: Turing–Complete RAG’s Case study on Medical LLM Systems
Xinke Jiang, Yue Fang, Rihong Qiu et al.
VMLU Benchmarks: A comprehensive benchmark toolkit for Vietnamese LLMs
Cuc Thi Bui, Nguyen Truong Son, Truong Van Trang et al.
Scaling up the State Size of RNN LLMs for Long-Context Scenarios
Kai Liu, Jianfei Gao, Kai Chen
PaSa: An LLM Agent for Comprehensive Academic Paper Search
Yichen He, Guanhua Huang, Peiyuan Feng et al.
HyKGE: A Hypothesis Knowledge Graph Enhanced RAG Framework for Accurate and Reliable Medical LLMs Responses
Xinke Jiang, Ruizhe Zhang, Yongxin Xu et al.
UniLR: Unleashing the Power of LLMs on Multiple Legal Tasks with a Unified Legal Retriever
Ang Li, Yiquan Wu, Yifei Liu et al.
HomeBench: Evaluating LLMs in Smart Homes with Valid and Invalid Instructions Across Single and Multiple Devices
Silin Li, Yuhang Guo, Jiashu Yao et al.
Enhancing Interpretable Image Classification Through LLM Agents and Conditional Concept Bottleneck Models
Yiwen Jiang, Deval Mehta, Wei Feng et al.