Papers
AutoPenBench: A Vulnerability Testing Benchmark for Generative Agents
Luca Gioacchini, Alexander Delsanto, Idilio Drago et al.
Auto prompting without training labels: An LLM cascade for product quality assessment in e-commerce catalogs
Soham Satyadharma, Fatemeh Sheikholeslami, Swati Kaul et al.
AutoQual: An LLM Agent for Automated Discovery of Interpretable Features for Review Quality Assessment
Xiaochong Lan, Jie Feng, Yinxing Liu et al.
AutoSDT: Scaling Data-Driven Discovery Tasks Toward Open Co-Scientists
Yifei Li, Hanane Nour Moussa, Ziru Chen et al.
Auto-Weighted Group Relative Preference Optimization for Multi-Objective Text Generation Tasks
Yuki Ichihara, Yuu Jinnai
Averroes at ImageEval 2025 Shared Task: Advancing Arabic Image Captioning with Augmentation and Two-Stage Generation
Mariam Saeed, Sarah Elshabrawy, Abdelrahman Hagrass et al.
Avoidance Decoding for Diverse Multi-Branch Story Generation
Kyeongman Park, Nakyeong Yang, Kyomin Jung
Avoiding Knowledge Edit Skipping in Multi-hop Question Answering with Guided Decomposition
Yi Liu, Xiangrong Zhu, Xiangyu Liu et al.
AYA at PalmX 2025: Modeling Cultural and Islamic Knowledge in LLMs
Jannatul Tajrin, Bir Ballav Roy, Firoj Alam
AyahVerse at MAHED Shared Task: Fine-Tuning ArabicBERT with Preprocessing for Hope and Hate Detection
Ibad-ur-Rehman Rashid, Muhammad Hashir Khalil
A Zero-Shot Neuro-Symbolic Approach for Complex Knowledge Graph Question Answering
Prerna Agarwal, Srikanta Bedathur
Babies Learn to Look Ahead: Multi-Token Prediction in Small LMs
Ansar Aynetdinov, Alan Akbik
BabyLM’s First Constructions: Causal interventions provide a signal of learning
Joshua Rozner, Leonie Weissweiler, Cory Shain
Back Attention: Understanding and Enhancing Multi-Hop Reasoning in Large Language Models
Zeping Yu, Yonatan Belinkov, Sophia Ananiadou
Backdoor-Powered Prompt Injection Attacks Nullify Defense Methods
Yulin Chen, Haoran Li, Yuan Sui et al.
BacktrackAgent: Enhancing GUI Agent with Error Detection and Backtracking Mechanism
Qinzhuo Wu, Pengzhi Gao, Wei Liu et al.
BAGELS: Benchmarking the Automated Generation and Extraction of Limitations from Scholarly Text
Ibrahim Al Azher, Miftahul Jannat Mokarrama, Zhishuai Guo et al.
Bag of Tricks for Sparse Mixture-of-Experts: A Benchmark Across Reasoning, Efficiency, and Safety
Mufan Qiu, Zheyu Shen, Pingzhi Li et al.
Balanced Multi-Factor In-Context Learning for Multilingual Large Language Models
Masahiro Kaneko, Alham Fikri Aji, Timothy Baldwin
Balancing Quality and Variation: Spam Filtering Distorts Data Label Distributions
Eve Fleisig, Matthias Orlikowski, Philipp Cimiano et al.
Balcony: A Lightweight Approach to Dynamic Inference of Generative Language Models
Benyamin Jamialahmadi, Parsa Kavehzadeh, Mehdi Rezagholizadeh et al.
BALSAM: A Platform for Benchmarking Arabic Large Language Models
Rawan Nasser Almatham, Kareem Mohamed Darwish, Raghad Al-Rasheed et al.