Papers
Establishing Trustworthiness: Rethinking Tasks and Model Evaluation
Robert Litschko, Max Müller-Eberstein, Rob van der Goot et al.
Estimating Large Language Model Capabilities without Labeled Test Data
Harvey Fu, Qinyuan Ye, Albert Xu et al.
e-THERAPIST: I suggest you to cultivate a mindset of positivity and nurture uplifting thoughts
Kshitij Mishra, Priyanshu Priya, Manisha Burja et al.
Ethical Reasoning over Moral Alignment: A Case and Framework for In-Context Ethical Policies in LLMs
Abhinav Rao, Aditi Khandelwal, Kumar Tanmay et al.
EtiCor: Corpus for Analyzing LLMs for Etiquettes
Ashutosh Dwivedi, Pradhyumna Lavania, Ashutosh Modi
Euphemistic Abuse – A New Dataset and Classification Experiments for Implicitly Abusive Language
Michael Wiegand, Jana Kampfmeier, Elisabeth Eder et al.
Evaluating and Enhancing the Robustness of Code Pre-trained Models through Structure-Aware Adversarial Samples Generation
Nuo Chen, Qiushi Sun, Jianing Wang et al.
Evaluating and Modeling Attribution for Cross-Lingual Question Answering
Benjamin Muller, John Wieting, Jonathan H. Clark et al.
Evaluating Bias and Fairness in Gender-Neutral Pretrained Vision-and-Language Models
Laura Cabello, Emanuele Bugliarello, Stephanie Brandl et al.
Evaluating ChatGPT and Bard AI on Arabic Sentiment Analysis
Abdulmohsen Al-Thubaity, Sakhar Alkhereyf, Hanan Murayshid et al.
Evaluating Cross-Domain Text-to-SQL Models and Benchmarks
Mohammadreza Pourreza, Davood Rafiei
Evaluating Dependencies in Fact Editing for Language Models: Specificity and Implication Awareness
Zichao Li, Ines Arous, Siva Reddy et al.
Evaluating Emotion Arcs Across Languages: Bridging the Global Divide in Sentiment Analysis
Daniela Teodorescu, Saif Mohammad
Evaluating Evaluation Metrics: A Framework for Analyzing NLG Evaluation Metrics using Measurement Theory
Ziang Xiao, Susu Zhang, Vivian Lai et al.
Evaluating Large Language Models on Controlled Generation Tasks
Jiao Sun, Yufei Tian, Wangchunshu Zhou et al.
Evaluating Metrics for Document-context Evaluation in Machine Translation
Vikas Raunak, Tom Kocmi, Matt Post
Evaluating Neural Language Models as Cognitive Models of Language Acquisition
Héctor Javier Vázquez Martínez, Annika Heuser, Charles Yang et al.
Evaluating Object Hallucination in Large Vision-Language Models
Yifan Li, Yifan Du, Kun Zhou et al.
Evaluating Parameter-Efficient Finetuning Approaches for Pre-trained Models on the Financial Domain
Isabella Olariu, Cedric Lothritz, Jacques Klein et al.
Evaluating Subjective Cognitive Appraisals of Emotions from Large Language Models
Hongli Zhan, Desmond C. Ong, Junyi Jessy Li
Evaluating the Knowledge Base Completion Potential of GPT
Blerta Veseli, Simon Razniewski, Jan-Christoph Kalo et al.
Evaluating the Rationale Understanding of Critical Reasoning in Logical Reading Comprehension
Akira Kawabata, Saku Sugawara
Evaluating Transformer’s Ability to Learn Mildly Context-Sensitive Languages
Shunjie Wang, Shane Steinert-Threlkeld
Evaluating Verifiability in Generative Search Engines
Nelson Liu, Tianyi Zhang, Percy Liang
Evaluation Metrics in the Era of GPT-4: Reliably Evaluating Large Language Models on Sequence to Sequence Tasks
Andrea Sottana, Bin Liang, Kai Zou et al.