Papers
Measuring the Effect of Disfluency in Multilingual Knowledge Probing Benchmarks
Kirill Semenov, Rico Sennrich
MEBench: Benchmarking Large Language Models for Cross-Document Multi-Entity Question Answering
Teng Lin, Yuyu Luo, Honglin Zhang et al.
Mechanisms vs. Outcomes: Probing for Syntax Fails to Explain Performance on Targeted Syntactic Evaluations
Ananth Agarwal, Jasper Jian, Christopher D Manning et al.
Mechanistic Fine-tuning for In-context Learning
Hakaze Cho, Peng Luo, Mariko Kato et al.
Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models
Ercong Nie, Helmut Schmid, Hinrich Schuetze
MedCOD: Enhancing English-to-Spanish Medical Translation of Large Language Models Using Enriched Chain-of-Dictionary Framework
Md Shahidul Salim, Lian Fu, Arav Adikesh Ramakrishnan et al.
MedEBench: Diagnosing Reliability in Text-Guided Medical Image Editing
Minghao Liu, Zhitao He, Zhiyuan Fan et al.
MedFact: A Large-scale Chinese Dataset for Evidence-based Medical Fact-checking of LLM Responses
Tong Chen, Zimu Wang, Yiyi Miao et al.
MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models
Shrey Pandit, Jiawei Xu, Junyuan Hong et al.
Media Source Matters More Than Content: Unveiling Political Bias in LLM-Generated Citations
Sunhao Dai, Zhanshuo Cao, Wenjie Wang et al.
Medical Text Simplification From Jargon Detection to Jargon-Aware Prompting
Taiki Papandreou, Jan Bakker, Jaap Kamps
MediVLM: A Vision Language Model for Radiology Report Generation from Medical Images
Debanjan Goswami, Ronast Subedi, Shayok Chakraborty
MedLinkDE – MedDRA Entity Linking for German with Guided Chain of Thought Reasoning
Roman Christof, Farnaz Zeidi, Manuela Messelhäußer et al.
Med-PRM: Medical Reasoning Models with Stepwise, Guideline-verified Process Rewards
Jaehoon Yun, Jiwoong Sohn, Jungwoo Park et al.
MedTutor: A Retrieval-Augmented LLM System for Case-Based Medical Education
Dongsuk Jang, Ziyao Shangguan, Kyle Tegtmeyer et al.
Med-VRAgent: A Framework for Medical Visual Reasoning-Enhanced Agents
Guangfu Guo, Xiaoqian Lu, Yue Feng
MEETING DELEGATE: Benchmarking LLMs on Attending Meetings on Our Behalf
Lingxiang Hu, Shurun Yuan, Xiaoting Qin et al.
Membership and Memorization in LLM Knowledge Distillation
Ziqi Zhang, Ali Shahin Shamsabadi, Hanxiao Lu et al.
MemeArena: Automating Context-Aware Unbiased Evaluation of Harmfulness Understanding for Multimodal Large Language Models
Zixin Chen, Hongzhan Lin, Kaixin Li et al.
MemeIntel: Explainable Detection of Propagandistic and Hateful Memes
Mohamed Bayan Kmainasi, Abul Hasnat, Md Arid Hasan et al.
MemeInterpret: Towards an All-in-One Dataset for Meme Understanding
Jeongsik Park, Khoi P. N. Nguyen, Jihyung Park et al.
MemeReaCon: Probing Contextual Meme Understanding in Large Vision-Language Models
Zhengyi Zhao, Shubo Zhang, Yuxi Zhang et al.
MemInsight: Autonomous Memory Augmentation for LLM Agents
Rana Salama, Jason Cai, Michelle Yuan et al.
Memorization or Reasoning? Exploring the Idiom Understanding of LLMs
Jisu Kim, Youngwoo Shin, Uiji Hwang et al.
Memorization ≠ Understanding: Do Large Language Models Have the Ability of Scenario Cognition?
Boxiang Ma, Ru Li, Wang Yuanlong et al.