Papers
Masking in Multi-hop QA: An Analysis of How Language Models Perform with Context Permutation
Wenyu Huang, Pavlos Vougiouklis, Mirella Lapata et al.
Masks Can be Learned as an Alternative to Experts
Peiyu Liu, Tianwen Wei, Bo Zhu et al.
MasRouter: Learning to Route LLMs for Multi-Agent Systems
Yanwei Yue, Guibin Zhang, Boyang Liu et al.
Massively Multilingual Instruction-Following Information Extraction
Thang Le, Huy Huu Nguyen, Anh Tuan Luu et al.
MATCHED: Multimodal Authorship-Attribution To Combat Human Trafficking in Escort-Advertisement Data
Vageesh Kumar Saxena, Benjamin Ashpole, Gijs Van Dijck et al.
MateInfoUB: A Real-World Benchmark for Testing LLMs in Competitive, Multilingual, and Multimodal Educational Tasks
Marius Dumitran, Mihnea Buca, Theodor Moroianu
MathAgent: Leveraging a Mixture-of-Math-Agent Framework for Real-World Multimodal Mathematical Error Detection
Yibo Yan, Shen Wang, Jiahao Huo et al.
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning
Ke Wang, Junting Pan, Linda Wei et al.
MathD2: Towards Disambiguation of Mathematical Terms
Shufan Jiang, Mary Ann Tan, Harald Sack
MathFusion: Enhancing Mathematical Problem-solving of LLM through Instruction Fusion
Qizhi Pei, Lijun Wu, Zhuoshi Pan et al.
Math Neurosurgery: Isolating Language Models’ Math Reasoning Abilities Using Only Forward Passes
Bryan R Christ, Zachary Gottesman, Jonathan Kropko et al.
Matina: A Culturally-Aligned Persian Language Model Using Multiple LoRA Experts
Sara Bourbour Hosseinbeigi, MohammadAli SeifKashani, Javad Seraj et al.
MaXIFE: Multilingual and Cross-lingual Instruction Following Evaluation
Yile Liu, Ziwei Ma, Xiu Jiang et al.
Maximal Matching Matters: Preventing Representation Collapse for Robust Cross-Modal Retrieval
Hani Alomari, Anushka Sivakumar, Andrew Zhang et al.
Maximizing the Effectiveness of Larger BERT Models for Compression
Wen-Shu Fan, Su Lu, Shangyu Xing et al.
Maximum Score Routing For Mixture-of-Experts
Bowen Dong, Yilong Fan, Yutao Sun et al.
McBE: A Multi-task Chinese Bias Evaluation Benchmark for Large Language Models
Tian Lan, Xiangdong Su, Xu Liu et al.
McGill-NLP at SemEval-2025 Task 11: Bridging the Gap in Text-Based Emotion Detection
Vivek Verma, David Ifeoluwa Adelani
MC-MKE: A Fine-Grained Multimodal Knowledge Editing Benchmark Emphasizing Modality Consistency
Junzhe Zhang, Huixuan Zhang, Xunjian Yin et al.
MCQFormatBench: Robustness Tests for Multiple-Choice Questions
Hiroo Takizawa, Saku Sugawara, Akiko Aizawa
MCS-Bench: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models in Chinese Classical Studies
Yang Liu, Jiahuan Cao, Hiuyi Cheng et al.
MDBench: A Synthetic Multi-Document Reasoning Benchmark Generated with Knowledge Guidance
Joseph J Peper, Wenzhao Qiu, Ali Payani et al.
MDCure: A Scalable Pipeline for Multi-Document Instruction-Following
Gabrielle Kaili-May Liu, Bowen Shi, Avi Caciularu et al.
MDIT-Bench: Evaluating the Dual-Implicit Toxicity in Large Multimodal Models
Bohan Jin, Shuhan Qi, Kehai Chen et al.
Meaning Beyond Truth Conditions: Evaluating Discourse Level Understanding via Anaphora Accessibility
Xiaomeng Zhu, Zhenghao Zhou, Simon Charlow et al.