Papers

17,973 papers found
Estimating LLM Consistency: A User Baseline vs Surrogate Metrics
Xiaoyuan Wu, Weiran Lin, Omer Akgul et al.
2025 EMNLP
Estimating Machine Translation Difficulty
Lorenzo Proietti, Stefano Perrella, Vilém Zouhar et al.
2025 EMNLP
EuroGEST: Investigating gender stereotypes in multilingual language models
Jacqueline Rowe, Mateusz Klimaszewski, Liane Guillou et al.
2025 EMNLP
2025 EMNLP
Evaluating and Aligning Human Economic Risk Preferences in LLMs
Jiaxin Liu, Yixuan Tang, Yi Yang et al.
2025 EMNLP
Evaluating Compositional Generalisation in VLMs and Diffusion Models
Beth Pearson, Bilal Boulbarss, Michael Wray et al.
2025 EMNLP
Evaluating Compound AI Systems through Behaviors, Not Benchmarks
Pranav Bhagat, K N Ajay Shastry, Pranoy Panda et al.
2025 EMNLP
Evaluating Cultural Knowledge and Reasoning in LLMs Through Persian Allusions
Melika Nobakhtian, Yadollah Yaghoobzadeh, Mohammad Taher Pilehvar
2025 EMNLP
Evaluating distillation methods for data-efficient syntax learning
Takateru Yamakoshi, Thomas L. Griffiths, R. Thomas McCoy et al.
2025 EMNLP
Evaluating Evaluation Metrics – The Mirage of Hallucination Detection
Atharva Kulkarni, Yuan Zhang, Joel Ruben Antony Moniz et al.
2025 EMNLP
2025 EMNLP
Evaluating Large Language Models for Belief Inference: Mapping Belief Networks at Scale
Trisevgeni Papakonstantinou, Antonina Zhiteneva, Ana Yutong Ma et al.
2025 EMNLP
Evaluating Large Language Models for Cross-Lingual Retrieval
Longfei Zuo, Pingjun Hong, Oliver Kraus et al.
2025 EMNLP
Evaluating Large Language Models for Detecting Antisemitism
Jay Patel, Hrudayangam Mehta, Jeremy Blackburn
2025 EMNLP
Evaluating LLM-Generated Diagrams as Graphs
Chumeng Liang, Jiaxuan You
2025 EMNLP