Papers
Error Span Annotation: A Balanced Approach for Human Evaluation of Machine Translation
Tom Kocmi, Vilém Zouhar, Eleftherios Avramidis et al.
ERVQA: A Dataset to Benchmark the Readiness of Large Vision Language Models in Hospital Environments
Sourjyadip Ray, Kushal Gupta, Soumi Kundu et al.
ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers
Yuzhe Gu, Enmao Diao
ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models
Haiquan Zhao, Lingyu Li, Shisong Chen et al.
ESG-Kor: A Korean Dataset for ESG-related Information Extraction and Practical Use Cases
Jaeyoung Lee, Geonyeong Son, Misuk Kim
Estimating Knowledge in Large Language Models Without Generating a Single Token
Daniela Gottesman, Mor Geva
EU DisinfoTest: a Benchmark for Evaluating Language Models’ Ability to Detect Disinformation Narratives
Witold Sosnowski, Arkadiusz Modzelewski, Kinga Skorupska et al.
Evaluating and Training Long-Context Large Language Models for Question Answering on Scientific Papers
Lukas Hilgert, Danni Liu, Jan Niehues
Evaluating Automatic Metrics with Incremental Machine Translation Systems
Guojun Wu, Shay B Cohen, Rico Sennrich
Evaluating Biases in Context-Dependent Sexual and Reproductive Health Questions
Sharon Levy, Tahilin Sanchez Karver, William Adler et al.
Evaluating Character Understanding of Large Language Models via Character Profiling from Fictional Works
Xinfeng Yuan, Siyu Yuan, Yuhan Cui et al.
Evaluating Computational Representations of Character: An Austen Character Similarity Benchmark
Funing Yang, Carolyn Jane Anderson
Evaluating Concurrent Robustness of Language Models Across Diverse Challenge Sets
Vatsal Gupta, Pranshu Pandya, Tushar Kataria et al.
Evaluating Differentially Private Synthetic Data Generation in High-Stakes Domains
Krithika Ramesh, Nupoor Gandhi, Pulkit Madaan et al.
Evaluating Diversity in Automatic Poetry Generation
Yanran Chen, Hannes Gröner, Sina Zarrieß et al.
Evaluating D-MERIT of Partial-annotation on Information Retrieval
Royi Rassin, Yaron Fairstein, Oren Kalinsky et al.
Evaluating Gender Bias of LLMs in Making Morality Judgements
Divij Bajaj, Yuanyuan Lei, Jonathan Tong et al.
Evaluating Language Model Character Traits
Francis Rhys Ward, Zejia Yang, Alex Jackson et al.
Evaluating Language Models in Location Referring Expression Extraction from Early Modern and Contemporary Japanese Texts
Ayuki Katayama, Yusuke Sakai, Shohei Higashiyama et al.
Evaluating Large Language Models along Dimensions of Language Variation: A Systematik Invesdigatiom uv Cross-lingual Generalization
Niyati Bafna, Kenton Murray, David Yarowsky
Evaluating Large Language Models on Time Series Feature Understanding: A Comprehensive Taxonomy and Benchmark
Elizabeth Fons, Rachneet Kaur, Soham Palande et al.
Evaluating Large Language Models via Linguistic Profiling
Alessio Miaschi, Felice Dell’Orletta, Giulia Venturi
Evaluating LLMs for Targeted Concept Simplification for Domain-Specific Texts
Sumit Asthana, Hannah Rashkin, Elizabeth Clark et al.
Evaluating Moral Beliefs across LLMs through a Pluralistic Framework
Xuelin Liu, Yanfei Zhu, Shucheng Zhu et al.