Papers
Teams of LLM Agents can Exploit Zero-Day Vulnerabilities
Yuxuan Zhu, Antony Kellermann, Akul Gupta et al.
Do Political Opinions Transfer Between Western Languages? An Analysis of Unaligned and Aligned Multilingual LLMs
Franziska Weeber, Tanise Ceron, Sebastian Padó
H-MEM: Hierarchical Memory for High-Efficiency Long-Term Reasoning in LLM Agents
Haoran Sun, Shaoning Zeng, Bob Zhang
Cetvel: A Unified Benchmark for Evaluating Language Understanding, Generation and Cultural Capacity of LLMs for Turkish
Yakup Abrek Er, Ilker Kesen, Gözde Gül Şahin et al.
Persona Prompting as a Lens on LLM Social Reasoning
Jing Yang, Moritz Hechtbauer, Elisabeth Khalilov et al.
Lexical Popularity: Quantifying the Impact of Pre-training for LLM Performance
Elena Sofia Ruzzetti, Fabio Massimo Zanzotto, Tommaso Caselli
Uncovering Hidden Correctness in LLM Causal Reasoning via Symbolic Verification
Paul He, Yinya Huang, Mrinmaya Sachan et al.
CORE: Measuring Multi-Agent LLM Interaction Quality under Game-Theoretic Pressures
Punya Syon Pandey, Yongjin Yang, Jiarui Liu et al.
Aleph-Alpha-GermanWeb: Improving German-language LLM pre-training with model-based data curation and synthetic data generation
Thomas F Burns, Letitia Parcalabescu, Stephan Waeldchen et al.
Attacker’s Noise Can Manipulate Your Audio-based LLM in the Real World
Vinu Sankar Sadasivan, Soheil Feizi, Rajiv Mathews et al.
Say It Another Way: Auditing LLMs with a User-Grounded Automated Paraphrasing Framework
Clea Chataigner, Rebecca Ma, Prakhar Ganesh et al.
AutoBool: Reinforcement-Learned LLM for Effective Automatic Systematic Reviews Boolean Query Generation
Shuai Wang, Harrisen Scells, Bevan Koopman et al.
Improving LLM Domain Certification with Pretrained Guide Models
Jiaqian Zhang, Zhaozhi Qian, Faroq AL-Tam et al.
Coordinates from Context: Using LLMs to Ground Complex Location References
Tessa Masis, Brendan O'Connor
SearchLLM: Detecting LLM Paraphrased Text by Measuring the Similarity with Regeneration of the Candidate Source via Search Engine
Hoang-Quoc Nguyen-Son, Minh-Son Dao, Koji Zettsu
Unraveling LLM Jailbreaks Through Safety Knowledge Neurons
Chongwen Zhao, Yutong Ke, Kaizhu Huang
Knowledge Extraction on Semi-Structured Content: Does It Remain Relevant for Question Answering in the Era of LLMs?
Kai Sun, Yin Huang, Srishti Mehra et al.
Don’t Judge a Book by its Cover: Testing LLMs’ Robustness Under Logical Obfuscation
Abhilekh Borah, Shubhra Ghosh, Kedar Joshi et al.
Reasoning or Knowledge: Stratified Evaluation of Biomedical LLMs
Rahul Thapa, Qingyang Wu, Kevin Wu et al.
AfriVox: Probing Multilingual and Accent Robustness of Speech LLMs
Busayo Awobade, Mardhiyah Sanni, Tassallah Abdullahi et al.
PTEB: Towards Robust Text Embedding Evaluation via Stochastic Paraphrasing at Evaluation Time with LLMs
Manuel Frank, Haithem Afli
How Good Are LLMs at Processing Tool Outputs?
Kiran Kate, Yara Rizk, Poulami Ghosh et al.
Tug-of-war between idioms’ figurative and literal interpretations in LLMs
Soyoung Oh, Xinting Huang, Mathis Pink et al.
Do LLM hallucination detectors suffer from low-resource effect?
Debtanu Datta, Mohan Kishore Chilukuri, Yash Kumar et al.