Papers
16,749 papers found
Evaluating Personalized Tool-Augmented LLMs from the Perspectives of Personalization and Proactivity
Yupu Hao, Pengfei Cao, Zhuoran Jin et al.
Evaluating Pretrained Causal Language Models for Synonymy
Ioana Ivan, Carlos Ramisch, Alexis Nasr
Evaluating Retrieval Augmented Generation to Communicate UK Climate Change Information
Arjun Biswas, Hatim Chahout, Tristan Pigram et al.
Evaluating Robustness of LLMs to Typographical Noise in Yorùbá QA
Paul Okewunmi, Favour James, Oluwadunsin Fajemila
Evaluating Sequence Labeling on the basis of Information Theory
Enrique Amigo, Elena Álvarez-Mellado, Julio Gonzalo et al.
Evaluating Structured Output Robustness of Small Language Models for Open Attribute-Value Extraction from Clinical Notes
Nikita Neveditsin, Pawan Lingras, Vijay Kumar Mago
Evaluating the Evaluation of Diversity in Commonsense Generation
Tianhui Zhang, Bei Peng, Danushka Bollegala
Evaluating the Long-Term Memory of Large Language Models
Zixi Jia, Qinghua Liu, Hexiao Li et al.
Evaluating Theory of (an uncertain) Mind: Predicting the Uncertain Beliefs of Others from Conversational Cues
Anthony Sicilia, Malihe Alikhani
Evaluating the Quality of Benchmark Datasets for Low-Resource Languages: A Case Study on Turkish
Elif Ecem Umutlu, Ayse Aysu Cengiz, Ahmet Kaan Sever et al.
Evaluating Tokenizer Adaptation Methods for Large Language Models on Low-Resource Programming Languages
Georgy Andryushchenko, Vladimir V. Ivanov
Evaluating Visual and Cultural Interpretation: The K-Viscuit Benchmark with Human-VLM Collaboration
ChaeHun Park, Yujin Baek, Jaeseok Kim et al.
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models
Fan Zhang, Shulin Tian, Ziqi Huang et al.
Evaluation of Attribution Bias in Generator-Aware Retrieval-Augmented Large Language Models
Amin Abolghasemi, Leif Azzopardi, Seyyed Hadi Hashemi et al.
Evaluation of LLMs in Medical Text Summarization: The Role of Vocabulary Adaptation in High OOV Settings
Gunjan Balde, Soumyadeep Roy, Mainack Mondal et al.
Evaluation of LLM Vulnerabilities to Being Misused for Personalized Disinformation Generation
Aneta Zugecova, Dominik Macko, Ivan Srba et al.
Event-based evaluation of abstractive news summarization
Huiling You, Samia Touileb, Lilja Øvrelid et al.
Event Pattern-Instance Graph: A Multi-Round Role Representation Learning Strategy for Document-Level Event Argument Extraction
Qizhi Wan, Tao Liu, Changxuan Wan et al.
EventRAG: Enhancing LLM Generation with Event Knowledge Graphs
Zairun Yang, Yilin Wang, Zhengyan Shi et al.
Evidence of Generative Syntax in LLMs
Mary Kennedy
EvoBench: Towards Real-world LLM-Generated Text Detection Benchmarking for Evolving Large Language Models
Xiao Yu, Yi Yu, Dongrui Liu et al.
EvolveBench: A Comprehensive Benchmark for Assessing Temporal Awareness in LLMs on Evolving Knowledge
Zhiyuan Zhu, Yusheng Liao, Zhe Chen et al.
EvoWiki: Evaluating LLMs on Evolving Knowledge
Wei Tang, Yixin Cao, Yang Deng et al.
Examining the Cultural Encoding of Gender Bias in LLMs for Low-Resourced African Languages
Abigail Oppong, Hellina Hailu Nigatu, Chinasa T. Okolo
Exclusion of Thought: Mitigating Cognitive Load in Large Language Models for Enhanced Reasoning in Multiple-Choice Tasks
Qihang Fu, Yongbin Qin, Ruizhang Huang et al.