Papers
Evaluating Multilingual Long-Context Models for Retrieval and Reasoning
Ameeta Agrawal, Andy Dang, Sina Bagheri Nezhad et al.
Evaluating n-Gram Novelty of Language Models Using Rusty-DAWG
William Merrill, Noah A. Smith, Yanai Elazar
Evaluating Open-Source LLMs in Low-Resource Languages: Insights from Latvian High School Exams
Roberts Darģis, Guntis Bārzdiņš, Inguna Skadiņa et al.
Evaluating Psychological Safety of Large Language Models
Xingxuan Li, Yutong Li, Lin Qiu et al.
Evaluating Readability and Faithfulness of Concept-based Explanations
Meng Li, Haoran Jin, Ruixuan Huang et al.
Evaluating Short-Term Temporal Fluctuations of Social Biases in Social Media Data and Masked Language Models
Yi Zhou, Danushka Bollegala, Jose Camacho-Collados
Evaluating the Effectiveness of Large Language Models in Establishing Conversational Grounding
Biswesh Mohapatra, Manav Nitin Kapadnis, Laurent Romary et al.
Evaluating the Instruction-Following Robustness of Large Language Models to Prompt Injection
Zekun Li, Baolin Peng, Pengcheng He et al.
Evaluating the Simplification of Brazilian Legal Rulings in LLMs Using Readability Scores as a Target
Antonio Flavio Paula, Celso Camilo-Junior
Evaluating WMT 2024 Metrics Shared Task Submissions on AfriMTE (the African Challenge Set)
Jiayi Wang, David Ifeoluwa Adelani, Pontus Stenetorp
Evaluation and Large-scale Training for Contextual Machine Translation
Matt Post, Marcin Junczys-Dowmunt
Evaluation of Question Answer Generation for Portuguese: Insights and Datasets
Felipe Paula, Cassiana Roberta Lizzoni Michelin, Viviane Moreira
Evalverse: Unified and Accessible Library for Large Language Model Evaluation
Jihoo Kim, Wonho Song, Dahyun Kim et al.
EVEDIT: Event-based Knowledge Editing for Deterministic Knowledge Propagation
Jiateng Liu, Pengfei Yu, Yuji Zhang et al.
Event Causality Identification with Synthetic Control
Haoyu Wang, Fengze Liu, Jiayao Zhang et al.
Event-Keyed Summarization
William Gantt, Alexander Martin, Pavlo Kuchmiichuk et al.
Evidence-backed Fact Checking using RAG and Few-Shot In-Context Learning with LLMs
Ronit Singal, Pransh Patwa, Parth Patwa et al.
Evidence-Focused Fact Summarization for Knowledge-Augmented Zero-Shot Question Answering
Sungho Ko, Hyunjin Cho, Hyungjoo Chae et al.
Evidence Retrieval for Fact Verification using Multi-stage Reranking
Shrikant Malviya, Stamos Katsigiannis
Evolutionary Contrastive Distillation for Language Model Alignment
Julian Katz-Samuels, Zheng Li, Hyokun Yun et al.
EvoR: Evolving Retrieval for Code Generation
Hongjin Su, Shuyang Jiang, Yuhang Lai et al.
Examining Language Modeling Assumptions Using an Annotated Literary Dialect Corpus
Craig Messner, Tom Lippincott
Expanding FLORES+ Benchmark for More Low-Resource Settings: Portuguese-Emakhuwa Machine Translation Evaluation
Felermino Dario Mario Ali, Henrique Lopes Cardoso, Rui Sousa-Silva
Expanding the FLORES+ Multilingual Benchmark with Translations for Aragonese, Aranese, Asturian, and Valencian
Juan Antonio Perez-Ortiz, Felipe Sánchez-Martínez, Víctor M. Sánchez-Cartagena et al.