Papers
MAPS: A Multilingual Benchmark for Agent Performance and Security
Omer Hofman, Jonathan Brokman, Oren Rachmil et al.
MAQuA: Multi-outcome Adaptive Question-Asking for Mental Health using Item Response Theory
Vasudha Varadarajan, Hui Xu, Rebecca Astrid Böhme et al.
Marking Code Without Breaking It: Code Watermarking for Detecting LLM-Generated Code
Jungin Kim, Shinwoo Park, Yo-Sub Han
Martingale Foresight Sampling: A Principled Approach to Inference-Time LLM Decoding
Huayu Li, ZhengXiao He, Siyuan Tian et al.
Mary, the Cheeseburger-Eating Vegetarian: Do LLMs Recognize Incoherence in Narratives?
Karin De Langis, Püren Öncel, Ryan Peters et al.
Mask What Matters: Mitigating Object Hallucinations in Multimodal Large Language Models with Object-Aligned Visual Contrastive Decoding
Boqi Chen, Xudong Liu, Jianing Qiu
MathEDU: Feedback Generation on Problem-Solving Processes for Mathematical Learning Support
Wei-Ling Hsu, Yu-Chien Tang, An-Zi Yen
MATH-IDN: A Multilingual Mathematical Problem Solving Dataset Featuring Local Languages in Indonesia
Xiao Xiao, Iftitahu Ni'mah, Yuyun Wabula et al.
MathMist: A Parallel Multilingual Benchmark Dataset for Mathematical Problem Solving and Reasoning
Mahbub E Sobhani, Md. Faiyaz Abdullah Sayeedi, Tasnim Mohiuddin et al.
MAViS: A Multi-Agent Framework for Long-Sequence Video Storytelling
Qian Wang, Ziqi Huang, Ruoxi Jia et al.
MBZUAI at AMIYA Shared Task 2026: Adapting Open-Source LLMs for Dialectal Arabic
Rana Gaber, Yara Allam, Serag Amin et al.
McMining: Automated Discovery of Misconceptions in Student Code
Erfan Al-Hossami, Razvan Bunescu
Measuring Idiomaticity in Text Embedding Models with epsilon-compositionality
Sondre Wold, Étienne Simon, Erik Velldal et al.
Measuring Linguistic Competence of LLMs on Indigenous Languages of the Americas
Justin Vasselli, Arturo Mp, Frederikus Hudi et al.
Measuring LLMs’ Sensitivity to Paraphrased Opinion Prompts
Bushra Alhetelah, Irfan Ahmad
Measuring Mechanistic Independence: Can Bias Be Removed Without Erasing Demographics?
Zhengyang Shan, Aaron Mueller
Measuring Social Integration Through Participation: Categorizing Organizations and Leisure Activities in the Displaced Karelians Interview Archive using LLMs
Joonatan Laato, Veera Schroderus, Jenna Kanerva et al.
Measuring the Symbolic Power of Languages with LLM-based Multilingual Persuasion Simulation
Yin Jou Huang, Fei Cheng
MEDAL: A Framework for Benchmarking LLMs as Multilingual Open-Domain Dialogue Evaluators
John Mendonça, Alon Lavie, Isabel Trancoso
MedQA-CS: Objective Structured Clinical Examination (OSCE)-Style Benchmark for Evaluating LLM Clinical Skills
Zonghai Yao, Zihao Zhang, Chaolong Tang et al.
MedRiskEval: Medical Risk Evaluation Benchmark of Language Models, On the Importance of User Perspectives in Healthcare Settings
Jean-Philippe Corbeil, Minseon Kim, Maxime Griot et al.
MEENA (PersianMMMU): Multimodal-Multilingual Educational Exams for N-level Assessment
Omid Ghahroodi, Arshia Hemmat, Marzia Nouri et al.