Papers
17,973 papers found
ESC-Judge: A Framework for Comparing Emotional Support Conversational Agents
Navid Madani, Rohini Srihari
ESGenius: Benchmarking LLMs on Environmental, Social, and Governance (ESG) and Sustainability Knowledge
Chaoyue He, Xin Zhou, Yi Wu et al.
Estimating LLM Consistency: A User Baseline vs Surrogate Metrics
Xiaoyuan Wu, Weiran Lin, Omer Akgul et al.
Estimating Machine Translation Difficulty
Lorenzo Proietti, Stefano Perrella, Vilém Zouhar et al.
ET-MIER: Entity Type-guided Key Mention Identification and Evidence Retrieval for Document-level Relation Extraction
Xin Li, Huangming Xu, Fu Zhang et al.
EuroGEST: Investigating gender stereotypes in multilingual language models
Jacqueline Rowe, Mateusz Klimaszewski, Liane Guillou et al.
Evaluating AI for Finance: Is AI Credible at Assessing Investment Risk Appetite?
Divij Chawla, Ashita Bhutada, Duc Anh Do et al.
Evaluating and Aligning Human Economic Risk Preferences in LLMs
Jiaxin Liu, Yixuan Tang, Yi Yang et al.
Evaluating Automatic Speech Recognition Systems for Korean Meteorological Experts
ChaeHun Park, Hojun Cho, Jaegul Choo
Evaluating Behavioral Alignment in Conflict Dialogue: A Multi-Dimensional Comparison of LLM Agents and Humans
Deuksin Kwon, Kaleen Shrestha, Bin Han et al.
Evaluating Cognitive-Behavioral Fixation via Multimodal User Viewing Patterns on Social Media
Yujie Wang, Yunwei Zhao, Jing Yang et al.
Evaluating Compositional Generalisation in VLMs and Diffusion Models
Beth Pearson, Bilal Boulbarss, Michael Wray et al.
Evaluating Compound AI Systems through Behaviors, Not Benchmarks
Pranav Bhagat, K N Ajay Shastry, Pranoy Panda et al.
Evaluating Conversational Agents with Persona-driven User Simulations based on Large Language Models: A Sales Bot Case Study
Justyna Gromada, Alicja Kasicka, Ewa Komkowska et al.
Evaluating Cultural Knowledge and Reasoning in LLMs Through Persian Allusions
Melika Nobakhtian, Yadollah Yaghoobzadeh, Mohammad Taher Pilehvar
Evaluating distillation methods for data-efficient syntax learning
Takateru Yamakoshi, Thomas L. Griffiths, R. Thomas McCoy et al.
Evaluating Evaluation Metrics – The Mirage of Hallucination Detection
Atharva Kulkarni, Yuan Zhang, Joel Ruben Antony Moniz et al.
Evaluating Fairness in Large Vision-Language Models Across Diverse Demographic Attributes and Prompts
Xuyang Wu, Yuan Wang, Hsin-Tai Wu et al.
Evaluating Health Question Answering Under Readability-Controlled Style Perturbations
Md Mushfiqur Rahman, Kevin Lybarger
Evaluating Language Translation Models by Playing Telephone
Syeda Jannatus Saba, Steven Skiena
Evaluating Large Language Models for Belief Inference: Mapping Belief Networks at Scale
Trisevgeni Papakonstantinou, Antonina Zhiteneva, Ana Yutong Ma et al.
Evaluating Large Language Models for Cross-Lingual Retrieval
Longfei Zuo, Pingjun Hong, Oliver Kraus et al.
Evaluating Large Language Models for Detecting Antisemitism
Jay Patel, Hrudayangam Mehta, Jeremy Blackburn
Evaluating LLM-Generated Diagrams as Graphs
Chumeng Liang, Jiaxuan You
Evaluating LLM-Generated Legal Explanations for Regulatory Compliance in Social Media Influencer Marketing
Haoyang Gui, Thales Bertaglia, Taylor Annabell et al.