Papers
Revisiting Generalization Across Difficulty Levels: It’s Not So Easy
Yeganeh Kordi, Nihal V. Nayak, Max Zuo et al.
R-GDA: Reflective Guidance Data Augmentation with Multi-Agent Feedback for Domain-Specific Named Entity Recognition
Hyeonseok Kang, Hyuk Namgoong, Goun Pyeon et al.
RiddleBench: A New Generative Reasoning Benchmark for LLMs
Deepon Halder, Alan Saji, Thanmay Jayakumar et al.
RoD-TAL: A Benchmark for Answering Questions in Romanian Driving License Exams
Andrei Vlad Man, Răzvan-Alexandru Smădu, Cristian-George Craciun et al.
Role-Conditioned Refusals: Evaluating Access Control Reasoning in Large Language Models
Đorđe Klisura, Joseph Khoury, Ashish Kundu et al.
RoSE: Round-robin Synthetic Data Evaluation for Selecting LLM Generators without Human Test Sets
Jan Cegin, Branislav Pecher, Ivan Srba et al.
RotBench: Evaluating Multi-modal Large Language Models on Identifying Image Rotation
Tianyi Niu, Jaemin Cho, Elias Stengel-Eskin et al.
Router-Suggest: Dynamic Routing for Multimodal Auto-Completion in Visually-Grounded Dialogs
Sandeep Mishra, Devichand Budagam, Anubhab Mandal et al.
R-R at AbjadAuthorID Shared Task: A Fine-Tuned Approach for Kurdish Authorship Identification
Rania Azad M. San Ahmed, Rebwar M. Nabi
RV-Syn: Rational and Verifiable Mathematical Reasoning Data Synthesis based on Structured Function Library
Jiapeng Wang, Jinhao Jiang, Zhiqiang Zhang et al.
SAFARI: A Community-Engaged Approach and Dataset of Stereotype Resources in the Sub-Saharan African Context
Aishwarya Verma, Laud Ammah, Olivia Nercy Ndlovu Lucas et al.
Safeguarding Language Models via Self-Destruct Trapdoor
Shahar Katz, Bar Alon, Ariel Shaulov et al.
SafeSearch: Do Not Trade Safety for Utility in LLM Search Agents
Qiusi Zhan, Angeline Budiman-Chan, Abdelrahman Zayed et al.
Safety of Large Language Models Beyond English: A Systematic Literature Review of Risks, Biases, and Safeguards
Aleksandra Krasnodębska, Katarzyna Dziewulska, Karolina Seweryn et al.
Safe-Unsafe Concept Separation Emerges from a Single Direction in Language Models Activation Space
Andrea Ermellino, Lorenzo Malandri, Fabio Mercorio et al.
SAGE: An Agentic Explainer Framework for Interpreting SAE Features in Language Models
Jiaojiao Han, Wujiang Xu, Mingyu Jin et al.
SAGE : A Top-Down Bottom-Up Knowledge-Grounded User Simulator for Multi-turn Agent Evaluation
Ryan Shea, Yunan Lu, Liang Qiu et al.
SAGE: Steerable Agentic Data Generation for Deep Search with Execution Feedback
Fangyuan Xu, Rujun Han, Yanfei Chen et al.
Sahara Tokenizers at PARSEME 2.0 Subtask 1: Combining Contextual Embeddings with Structural Decoding for Multi-Word Expression Detection
Yunus Karatepe, Mert Sülük, Zeynep Tuğçe Kırımlı et al.
SALT-31: A Machine Translation Benchmark Dataset for 31 Ugandan Languages
Solomon Nsumba, Benjamin Akera, Evelyn Nafula Ouma et al.
SALT: Step-level Advantage Assignment for Long-horizon Agents via Trajectory Graph
Jiazheng Li, Yawei Wang, Qiaojing Yan et al.
Sample-Size Scaling of the African Languages NLI Evaluation
Anuj Tiwari, Oluwapelumi Ogunremu, Terry Oko-odion et al.
SarcasTürk: Turkish Context-Aware Sarcasm Detection Dataset
Niyazi Ahmet Metin, Sevde Yılmaz, Osman Enes Erdoğdu et al.
Say It Another Way: Auditing LLMs with a User-Grounded Automated Paraphrasing Framework
Clea Chataigner, Rebecca Ma, Prakhar Ganesh et al.