Papers
MATCH: Task-Driven Code Evaluation through Contrastive Learning
Marah Ghoummaid, Vladimir Tchuiev, Ofek Glick et al.
MathBuddy: A Multimodal System for Affective Math Tutoring
Debanjana Kar, Leopold Böss, Dacia Braca et al.
Math Natural Language Inference: this should be easy!
Valeria de Paiva, Qiyue Gao, Hai Hu et al.
MathTutorBench: A Benchmark for Measuring Open-ended Pedagogical Capabilities of LLM Tutors
Jakub Macina, Nico Daheim, Ido Hakimi et al.
Matter-of-Fact: A Benchmark for Verifying the Feasibility of Literature-Supported Claims in Materials Science
Peter Jansen, Samiah Hassan, Ruoyao Wang
MAviS: A Multimodal Conversational Assistant For Avian Species
Yevheniia Kryklyvets, Mohammed Irfan Kurpath, Sahal Shaji Mullappilly et al.
MAVL: A Multilingual Audio-Video Lyrics Dataset for Animated Song Translation
Woohyun Cho, Youngmin Kim, Sunghyun Lee et al.
MaZO: Masked Zeroth-Order Optimization for Multi-Task Fine-Tuning of Large Language Models
Zhen Zhang, Yifan Yang, Kai Zhen et al.
M-BRe: Discovering Training Samples for Relation Extraction from Unlabeled Texts with Large Language Models
Zexuan Li, Hongliang Dai, Piji Li
MC2: A Minimum-Coverage and Dataset-Agnostic Framework for Compositional Generalization of LLMs on Semantic Parsing
Ziyao Xu, Zhe Yang, Houfeng Wang
MCIP: Protecting MCP Safety via Model Contextual Integrity Protocol
Huihao Jing, Haoran Li, Wenbin Hu et al.
MCiteBench: A Multimodal Benchmark for Generating Text with Citations
Caiyu Hu, Yikai Zhang, Tinghui Zhu et al.
McMaster at LeWiDi-2025: Demographic-Aware RoBERTa
Aadi Sanghani, Sarvin Azadi, Virendra Jethra et al.
MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models
Zhiwei Liu, Jielin Qiu, Shiyu Wang et al.
MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search
Yunhai Hu, Yilun Zhao, Chen Zhao et al.
MDSEval: A Meta-Evaluation Benchmark for Multimodal Dialogue Summarization
Yinhong Liu, Jianfeng He, Hang Su et al.
Meaningful Pose-Based Sign Language Evaluation
Zifan Jiang, Colin Leong, Amit Moryossef et al.
Measuring and Mitigating Media Outlet Name Bias in Large Language Models
Seong-Jin Park, Kang-Min Kim
Measuring Bias or Measuring the Task: Understanding the Brittle Nature of LLM Gender Biases
Bufan Gao, Elisa Kreiss
Measuring Chain of Thought Faithfulness by Unlearning Reasoning Steps
Martin Tutek, Fateme Hashemi Chaleshtori, Ana Marasovic et al.
Measuring Lexical Diversity of Synthetic Data Generated through Fine-Grained Persona Prompting
Gauri Kambhatla, Chantal Shaib, Venkata S Govindarajan
Measuring Risk of Bias in Biomedical Reports: The RoBBR Benchmark
Jianyou Wang, Weili Cao, Longtian Bao et al.
Measuring scalar constructs in social science with LLMs
Hauke Licht, Rupak Sarkar, Patrick Y. Wu et al.
Measuring Sexism in US Elections: A Comparative Analysis of X Discourse from 2020 to 2024
Anna Fuchs, Elisa Noltenius, Caroline Weinzierl et al.
Measuring Sycophancy of Language Models in Multi-turn Dialogues
Jiseung Hong, Grace Byun, Seungone Kim et al.