Papers
MDSEval: A Meta-Evaluation Benchmark for Multimodal Dialogue Summarization
Yinhong Liu, Jianfeng He, Hang Su et al.
Meaningful Pose-Based Sign Language Evaluation
Zifan Jiang, Colin Leong, Amit Moryossef et al.
Measuring and Mitigating Media Outlet Name Bias in Large Language Models
Seong-Jin Park, Kang-Min Kim
Measuring Bias or Measuring the Task: Understanding the Brittle Nature of LLM Gender Biases
Bufan Gao, Elisa Kreiss
Measuring Chain of Thought Faithfulness by Unlearning Reasoning Steps
Martin Tutek, Fateme Hashemi Chaleshtori, Ana Marasovic et al.
Measuring Lexical Diversity of Synthetic Data Generated through Fine-Grained Persona Prompting
Gauri Kambhatla, Chantal Shaib, Venkata S Govindarajan
Measuring Risk of Bias in Biomedical Reports: The RoBBR Benchmark
Jianyou Wang, Weili Cao, Longtian Bao et al.
Measuring scalar constructs in social science with LLMs
Hauke Licht, Rupak Sarkar, Patrick Y. Wu et al.
Measuring Sexism in US Elections: A Comparative Analysis of X Discourse from 2020 to 2024
Anna Fuchs, Elisa Noltenius, Caroline Weinzierl et al.
Measuring Sycophancy of Language Models in Multi-turn Dialogues
Jiseung Hong, Grace Byun, Seungone Kim et al.
Measuring the Effect of Disfluency in Multilingual Knowledge Probing Benchmarks
Kirill Semenov, Rico Sennrich
MEBench: Benchmarking Large Language Models for Cross-Document Multi-Entity Question Answering
Teng Lin, Yuyu Luo, Honglin Zhang et al.
Mechanisms vs. Outcomes: Probing for Syntax Fails to Explain Performance on Targeted Syntactic Evaluations
Ananth Agarwal, Jasper Jian, Christopher D Manning et al.
Mechanistic Fine-tuning for In-context Learning
Hakaze Cho, Peng Luo, Mariko Kato et al.
Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models
Ercong Nie, Helmut Schmid, Hinrich Schuetze
MedCOD: Enhancing English-to-Spanish Medical Translation of Large Language Models Using Enriched Chain-of-Dictionary Framework
Md Shahidul Salim, Lian Fu, Arav Adikesh Ramakrishnan et al.
MedEBench: Diagnosing Reliability in Text-Guided Medical Image Editing
Minghao Liu, Zhitao He, Zhiyuan Fan et al.
MedFact: A Large-scale Chinese Dataset for Evidence-based Medical Fact-checking of LLM Responses
Tong Chen, Zimu Wang, Yiyi Miao et al.
MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models
Shrey Pandit, Jiawei Xu, Junyuan Hong et al.
Media Source Matters More Than Content: Unveiling Political Bias in LLM-Generated Citations
Sunhao Dai, Zhanshuo Cao, Wenjie Wang et al.
Medical Text Simplification From Jargon Detection to Jargon-Aware Prompting
Taiki Papandreou, Jan Bakker, Jaap Kamps
MediVLM: A Vision Language Model for Radiology Report Generation from Medical Images
Debanjan Goswami, Ronast Subedi, Shayok Chakraborty
MedLingua at MedArabiQ2025: Zero- and Few-Shot Prompting of Large Language Models for Arabic Medical QA
Fatimah Mohamed Emad Elden, Mumina Ab. Abukar
MedLinkDE – MedDRA Entity Linking for German with Guided Chain of Thought Reasoning
Roman Christof, Farnaz Zeidi, Manuela Messelhäußer et al.