Papers
Evaluating Diversity in Automatic Poetry Generation
Yanran Chen, Hannes Gröner, Sina Zarrieß et al.
Evaluating Gender Bias of LLMs in Making Morality Judgements
Divij Bajaj, Yuanyuan Lei, Jonathan Tong et al.
Evaluating Language Model Character Traits
Francis Rhys Ward, Zejia Yang, Alex Jackson et al.
Evaluating Language Models in Location Referring Expression Extraction from Early Modern and Contemporary Japanese Texts
Ayuki Katayama, Yusuke Sakai, Shohei Higashiyama et al.
Evaluating Large Language Models along Dimensions of Language Variation: A Systematik Invesdigatiom uv Cross-lingual Generalization
Niyati Bafna, Kenton Murray, David Yarowsky
Evaluating Large Language Models on Time Series Feature Understanding: A Comprehensive Taxonomy and Benchmark
Elizabeth Fons, Rachneet Kaur, Soham Palande et al.
Evaluating Large Language Models via Linguistic Profiling
Alessio Miaschi, Felice Dell’Orletta, Giulia Venturi
Evaluating LLMs for Targeted Concept Simplification for Domain-Specific Texts
Sumit Asthana, Hannah Rashkin, Elizabeth Clark et al.
Evaluating Moral Beliefs across LLMs through a Pluralistic Framework
Xuelin Liu, Yanfei Zhu, Shucheng Zhu et al.
Evaluating Multilingual Long-Context Models for Retrieval and Reasoning
Ameeta Agrawal, Andy Dang, Sina Bagheri Nezhad et al.
Evaluating n-Gram Novelty of Language Models Using Rusty-DAWG
William Merrill, Noah A. Smith, Yanai Elazar
Evaluating Open-Source LLMs in Low-Resource Languages: Insights from Latvian High School Exams
Roberts Darģis, Guntis Bārzdiņš, Inguna Skadiņa et al.
Evaluating Psychological Safety of Large Language Models
Xingxuan Li, Yutong Li, Lin Qiu et al.
Evaluating Readability and Faithfulness of Concept-based Explanations
Meng Li, Haoran Jin, Ruixuan Huang et al.
Evaluating Short-Term Temporal Fluctuations of Social Biases in Social Media Data and Masked Language Models
Yi Zhou, Danushka Bollegala, Jose Camacho-Collados
Evaluating the Effectiveness of Large Language Models in Establishing Conversational Grounding
Biswesh Mohapatra, Manav Nitin Kapadnis, Laurent Romary et al.
Evaluating the Instruction-Following Robustness of Large Language Models to Prompt Injection
Zekun Li, Baolin Peng, Pengcheng He et al.
Evaluating the Simplification of Brazilian Legal Rulings in LLMs Using Readability Scores as a Target
Antonio Flavio Paula, Celso Camilo-Junior
Evaluating WMT 2024 Metrics Shared Task Submissions on AfriMTE (the African Challenge Set)
Jiayi Wang, David Ifeoluwa Adelani, Pontus Stenetorp
Evaluation and Large-scale Training for Contextual Machine Translation
Matt Post, Marcin Junczys-Dowmunt
Evaluation of Question Answer Generation for Portuguese: Insights and Datasets
Felipe Paula, Cassiana Roberta Lizzoni Michelin, Viviane Moreira
Evalverse: Unified and Accessible Library for Large Language Model Evaluation
Jihoo Kim, Wonho Song, Dahyun Kim et al.
EVEDIT: Event-based Knowledge Editing for Deterministic Knowledge Propagation
Jiateng Liu, Pengfei Yu, Yuji Zhang et al.