Papers
Octopus: Towards Building the Arabic Speech LLM Suite
Sara Althubaiti, Vasista Sai Lodagala, Tjad Clark et al.
ArabicWeb-Edu: Educational Quality Data for Arabic LLM Training
Majd Hawasly, Tasnim Mohiuddin, Hamdy Mubarak et al.
IslamicEval 2025: The First Shared Task of Capturing LLMs Hallucination in Islamic Content
Hamdy Mubarak, Rana Malhas, Watheq Mansour et al.
PalmX 2025: The First Shared Task on Benchmarking LLMs on Arabic and Islamic Culture
Fakhraddin Alwajih, Abdellah El Mekki, Hamdy Mubarak et al.
What did you say? Generating Child-Directed Speech Questions to Train LLMs
Whitney Poh, Michael Tombolini, Libby Barak
RecombiText: Compositional Data Augmentation for Enhancing LLM Pre-Training Datasets in Low-Resource Scenarios
Alexander Tampier, Lukas Thoma, Loris Schoenegger et al.
Char-mander Use mBackdoor! A Study of Cross-lingual Backdoor Attacks in Multilingual LLMs
Himanshu Beniwal, Sailesh Panda, Birudugadda Srivibhav et al.
The Comparative Trap: Pairwise Comparisons Amplifies Biased Preferences of LLM Evaluators
Hawon Jeong, ChaeHun Park, Jimin Hong et al.
Emergent Convergence in Multi-Agent LLM Annotation
Angelina Parfenova, Alexander Denzler, Jürgen Pfeffer
PrivacyScalpel: Enhancing LLM Privacy via Interpretable Feature Intervention with Sparse Autoencoders
Ahmed Frikha, Muhammad Reza Ar Razi, Krishna Kanth Nakka et al.
The Lookahead Limitation: Why Multi-Operand Addition is Hard for LLMs
Tanja Baeumel, Josef Van Genabith, Simon Ostermann
Can LLMs Detect Ambiguous Plural Reference? An Analysis of Split-Antecedent and Mereological Reference
Dang Thi Thao Anh, Rick Nouwen, Massimo Poesio
From BERT to LLMs: Comparing and Understanding Chinese Classifier Prediction in Language Models
Ziqi Zhang, Jianfei Ma, Emmanuele Chersoni et al.
What Features in Prompts Jailbreak LLMs? Investigating the Mechanisms Behind Attacks
Nathalie Maria Kirch, Constantin Niko Weisser, Severin Field et al.
Zero-Shot Belief: A Hard Problem for LLMs
John Murzaku, Owen Rambow
Probing the Limits of Multilingual Language Understanding: Low-Resource Language Proverbs as LLM Benchmark for AI Wisdom
Surendrabikram Thapa, Kritesh Rauniyar, Hariram Veeramani et al.
Referential ambiguity and clarification requests: comparing human and LLM behaviour
Chris Madge, Matthew Purver, Massimo Poesio
Mention detection with LLMs in pair-programming dialogue
Cecilia Domingo, Paul Piwek, Svetlana Stoyanchev et al.
Findings of the Fourth Shared Task on Multilingual Coreference Resolution: Can LLMs Dethrone Traditional Approaches?
Michal Novák, Miloslav Konopik, Anna Nedoluzhko et al.
Rethinking Search: A Study of University Students’ Perspectives on Using LLMs and Traditional Search Engines in Academic Problem Solving
Md. Faiyaz Abdullah Sayeedi, Md. Sadman Haque, Zobaer Ibn Razzaque et al.
Culturally-Aware Conversations: A Framework & Benchmark for LLMs
Shreya Havaldar, Young Min Cho, Sunny Rai et al.
MEETING DELEGATE: Benchmarking LLMs on Attending Meetings on Our Behalf
Lingxiang Hu, Shurun Yuan, Xiaoting Qin et al.
Dialogue Acts as a Lens on Human–LLM Interaction: Analyzing Conversational Norms in Model-Generated Responses
Arunima Maitra, Dorothea French, Katharina von der Wense
Syntactic Blind Spots: How Misalignment Leads to LLMs’ Mathematical Errors
Dane A Williamson, Yangfeng Ji, Matthew B. Dwyer