Papers
2,781 papers found
Controlling Out-of-Domain Gaps in LLMs for Genre Classification and Generated Text Detection
Dmitri Roussinov, Serge Sharoff, Nadezhda Puchnina
Finetuning LLMs for Comparative Assessment Tasks
Vatsal Raina, Adian Liusie, Mark Gales
Revisiting Implicitly Abusive Language Detection: Evaluating LLMs in Zero-Shot and Few-Shot Settings
Julia Jaremko, Dagmar Gromann, Michael Wiegand
Can LLMs Clarify? Investigation and Enhancement of Large Language Models on Argument Claim Optimization
Yiran Wang, Ben He, Xuanang Chen et al.
AraDiCE: Benchmarks for Dialectal and Cultural Capabilities in LLMs
Basel Mousi, Nadir Durrani, Fatema Ahmad et al.
How Credible Is an Answer From Retrieval-Augmented LLMs? Investigation and Evaluation With Multi-Hop QA
Yujia Zhou, Zheng Liu, Zhicheng Dou
Is Parameter Collision Hindering Continual Learning in LLMs?
Shuo Yang, Kun-Peng Ning, Yu-Yang Liu et al.
Large Language Models are good multi-lingual learners : When LLMs meet cross-lingual prompts
Teng Wang, Zhenqi He, Wing-Yin Yu et al.
QUENCH: Measuring the gap between Indic and Non-Indic Contextual General Reasoning in LLMs
Mohammad Aflah Khan, Neemesh Yadav, Sarah Masud et al.
What’s the most important value? INVP: INvestigating the Value Priorities of LLMs through Decision-making in Social Scenarios
Xuelin Liu, Pengyuan Liu, Dong Yu
BasqBBQ: A QA Benchmark for Assessing Social Biases in LLMs for Basque, a Low-Resource Language
Muitze Zulaika, Xabier Saralegi
Interactive Evaluation for Medical LLMs via Task-oriented Dialogue System
Ruoyu Liu, Kui Xue, Xiaofan Zhang et al.
What Makes Cryptic Crosswords Challenging for LLMs?
Abdelrahman Sadallah, Daria Kotova, Ekaterina Kochmar
Unlike “Likely”, “Unlike” is Unlikely: BPE-based Segmentation hurts Morphological Derivations in LLMs
Paul Lerner, François Yvon
LLMs meet Bloom’s Taxonomy: A Cognitive View on Large Language Model Evaluations
Thomas Huber, Christina Niklaus
Do LLMs Play Dice? Exploring Probability Distribution Sampling in Large Language Models for Behavioral Simulation
Jia Gu, Liang Pang, Huawei Shen et al.
On Evaluating LLMs’ Capabilities as Functional Approximators: A Bayesian Evaluation Framework
Shoaib Ahmed Siddiqui, Yanzhi Chen, Juyeon Heo et al.
LLMs May Perform MCQA by Selecting the Least Incorrect Option
Haochun Wang, Sendong Zhao, Zewen Qiang et al.
Empirical Study on Data Attributes Insufficiency of Evaluation Benchmarks for LLMs
Chuang Liu, Renren Jin, Zheng Yao et al.
Multi-Layered Evaluation Using a Fusion of Metrics and LLMs as Judges in Open-Domain Question Answering
Rashin Rahnamoun, Mehrnoush Shamsfard
Leveraging Taxonomy and LLMs for Improved Multimodal Hierarchical Classification
Shijing Chen, Mohamed Reda Bouadjenek, Usman Naseem et al.
Small Language Models can Outperform Humans in Short Creative Writing: A Study Comparing SLMs with Humans and LLMs
Guillermo Marco, Luz Rello, Julio Gonzalo
Beyond Surprisal: A Dual Metric Framework for Lexical Skill Acquisition in LLMs
Nazanin Shafiabadi, Guillaume Wisniewski