Papers
Measure only what is measurable: towards conversation requirements for evaluating task-oriented dialogue systems
Emiel Van Miltenburg, Anouck Braggaar, Emmelyn Croes et al.
Measuring Bias and Agreement in Large Language Model Presupposition Judgments
Katherine Atwell, Mandy Simons, Malihe Alikhani
Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric
Yuming Yang, Yang Nan, Junjie Ye et al.
Measuring Gender Bias in Language Models in Farsi
Hamidreza Saffari, Mohammadamin Shafiei, Donya Rooein et al.
Measuring Label Ambiguity in Subjective Tasks using Predictive Uncertainty Estimation
Richard Alies, Elena Merdjanovska, Alan Akbik
Measuring Social Biases in Masked Language Models by Proxy of Prediction Quality
Rahul Zalkikar, Kanchan Chandra
Measuring temporal effects of agent knowledge by date-controlled tool use
R. Patrick Xian, Qiming Cui, Stefan Bauer et al.
Measuring the Effect of Transcription Noise on Downstream Language Understanding Tasks
Ori Shapira, Shlomo Chazan, Amir David Nissan Cohen
Measuring What Makes You Unique: Difference-Aware User Modeling for Enhancing LLM Personalization
Yilun Qiu, Xiaoyan Zhao, Yang Zhang et al.
Measuring What Matters: Evaluating Ensemble LLMs with Label Refinement in Inductive Coding
Angelina Parfenova, Jürgen Pfeffer
Mechanistic Interpretability of Emotion Inference in Large Language Models
Ala N. Tak, Amin Banayeeanzade, Anahita Bolourani et al.
MECoT: Markov Emotional Chain-of-Thought for Personality-Consistent Role-Playing
Yangbo Wei, Zhen Huang, Fangzhou Zhao et al.
MedCite: Can Language Models Generate Verifiable Text for Medicine?
Xiao Wang, Mengjue Tan, Qiao Jin et al.
MedDecXtract: A Clinician-Support System for Extracting, Visualizing, and Annotating Medical Decisions in Clinical Narratives
Mohamed Elgaar, Hadi Amiri, Mitra Mohtarami et al.
MEDDxAgent: A Unified Modular Agent Framework for Explainable Automatic Differential Diagnosis
Daniel Philip Rose, Chia-Chien Hung, Marco Lepri et al.
MEDEC: A Benchmark for Medical Error Detection and Correction in Clinical Notes
Asma Ben Abacha, Wen-wai Yim, Yujuan Fu et al.
Medical Graph RAG: Evidence-based Medical Large Language Model via Graph Retrieval-Augmented Generation
Junde Wu, Jiayuan Zhu, Yunli Qi et al.
MedPlan: A Two-Stage RAG-Based System for Personalized Medical Plan Generation
Hsin-Ling Hsu, Cong-Tinh Dao, Luning Wang et al.
MedSummRAG: Domain-Specific Retrieval for Medical Summarization
Guanting Luo, Yuki Arase
Meetalk: Retrieval-Augmented and Adaptively Personalized Meeting Summarization with Knowledge Learning from User Corrections
Zheng Chen, Jiang Futian, Yue Deng et al.
MegaAgent: A Large-Scale Autonomous LLM-based Multi-Agent System Without Predefined SOPs
Qian Wang, Tianyu Wang, Zhenheng Tang et al.
MegaPairs: Massive Data Synthesis for Universal Multimodal Retrieval
Junjie Zhou, Yongping Xiong, Zheng Liu et al.
MEGen: Generative Backdoor into Large Language Models via Model Editing
Jiyang Qiu, Xinbei Ma, Zhuosheng Zhang et al.
MEIT: Multimodal Electrocardiogram Instruction Tuning on Large Language Models for Report Generation
Zhongwei Wan, Che Liu, Xin Wang et al.