Papers
3,922 papers found
MathEDU: Feedback Generation on Problem-Solving Processes for Mathematical Learning Support
Wei-Ling Hsu, Yu-Chien Tang, An-Zi Yen
MATH-IDN: A Multilingual Mathematical Problem Solving Dataset Featuring Local Languages in Indonesia
Xiao Xiao, Iftitahu Ni'mah, Yuyun Wabula et al.
MathMist: A Parallel Multilingual Benchmark Dataset for Mathematical Problem Solving and Reasoning
Mahbub E Sobhani, Md. Faiyaz Abdullah Sayeedi, Tasnim Mohiuddin et al.
MAViS: A Multi-Agent Framework for Long-Sequence Video Storytelling
Qian Wang, Ziqi Huang, Ruoxi Jia et al.
MBZUAI at AMIYA Shared Task 2026: Adapting Open-Source LLMs for Dialectal Arabic
Rana Gaber, Yara Allam, Serag Amin et al.
McMining: Automated Discovery of Misconceptions in Student Code
Erfan Al-Hossami, Razvan Bunescu
Measuring Idiomaticity in Text Embedding Models with epsilon-compositionality
Sondre Wold, Étienne Simon, Erik Velldal et al.
Measuring Linguistic Competence of LLMs on Indigenous Languages of the Americas
Justin Vasselli, Arturo Mp, Frederikus Hudi et al.
Measuring LLMs’ Sensitivity to Paraphrased Opinion Prompts
Bushra Alhetelah, Irfan Ahmad
Measuring Mechanistic Independence: Can Bias Be Removed Without Erasing Demographics?
Zhengyang Shan, Aaron Mueller
Measuring Social Integration Through Participation: Categorizing Organizations and Leisure Activities in the Displaced Karelians Interview Archive using LLMs
Joonatan Laato, Veera Schroderus, Jenna Kanerva et al.
Measuring the Symbolic Power of Languages with LLM-based Multilingual Persuasion Simulation
Yin Jou Huang, Fei Cheng
MEDAL: A Framework for Benchmarking LLMs as Multilingual Open-Domain Dialogue Evaluators
John Mendonça, Alon Lavie, Isabel Trancoso
MedQA-CS: Objective Structured Clinical Examination (OSCE)-Style Benchmark for Evaluating LLM Clinical Skills
Zonghai Yao, Zihao Zhang, Chaolong Tang et al.
MedRiskEval: Medical Risk Evaluation Benchmark of Language Models, On the Importance of User Perspectives in Healthcare Settings
Jean-Philippe Corbeil, Minseon Kim, Maxime Griot et al.
MEENA (PersianMMMU): Multimodal-Multilingual Educational Exams for N-level Assessment
Omid Ghahroodi, Arshia Hemmat, Marzia Nouri et al.
MemeWeaver: Inter-Meme Graph Reasoning for Sexism and Misogyny Detection
Paolo Italiani, David Gimeno-Gómez, Luca Ragazzi et al.
MERLIN: Multi-Stage Curriculum Alignment for Multilingual Encoder-LLM Integration in Cross-Lingual Reasoning
Kosei Uemura, David Guzmán, Quang Phuoc Nguyen et al.
MetaLead: A Comprehensive Human-Curated Leaderboard Dataset for Transparent Reporting of Machine Learning Experiments
Roelien C. Timmer, Necva Bölücü, Stephen Wan
MEVER: Multi-Modal and Explainable Claim Verification with Graph-based Evidence Retrieval
Delvin Ce Zhang, Suhan Cui, Zhelin Chu et al.
MIMIC: Multi-party Dialogue Augmentation via Speaker Stylistic Transfer
Gaetano Cimino, Giuseppe Carenini, Vincenzo Deufemia