Papers
Exploring the Impact of Language Switching on Personality Traits in LLMs
Jacopo Amidei, Jose Gregorio Ferreira De Sá, Rubén Nieto Luna et al.
LLMs Know What They Need: Leveraging a Missing Information Guided Framework to Empower Retrieval-Augmented Generation
Keheng Wang, Feiyu Duan, Peiguang Li et al.
LLM Sensitivity Challenges in Abusive Language Detection: Instruction-Tuned vs. Human Feedback
Yaqi Zhang, Viktor Hangya, Alexander Fraser
ALYMPICS: LLM Agents Meet Game Theory
Shaoguang Mao, Yuzhe Cai, Yan Xia et al.
Intention Analysis Makes LLMs A Good Jailbreak Defender
Yuqi Zhang, Liang Ding, Lefei Zhang et al.
LLM Sensitivity Evaluation Framework for Clinical Diagnosis
Chenwei Yan, Xiangling Fu, Yuxuan Xiong et al.
Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation
Siyuan Wang, Zhuohan Long, Zhihao Fan et al.
Controlling Out-of-Domain Gaps in LLMs for Genre Classification and Generated Text Detection
Dmitri Roussinov, Serge Sharoff, Nadezhda Puchnina
Finetuning LLMs for Comparative Assessment Tasks
Vatsal Raina, Adian Liusie, Mark Gales
Hermit Kingdom Through the Lens of Multiple Perspectives: A Case Study of LLM Hallucination on North Korea
Eunjung Cho, Won Ik Cho, Soomin Seo
HLU: Human Vs LLM Generated Text Detection Dataset for Urdu at Multiple Granularities
Iqra Ali, Jesse Atuhurra, Hidetaka Kamigaito et al.
ChatCite: LLM Agent with Human Workflow Guidance for Comparative Literature Summary
Yutong Li, Lu Chen, Aiwei Liu et al.
Revisiting Implicitly Abusive Language Detection: Evaluating LLMs in Zero-Shot and Few-Shot Settings
Julia Jaremko, Dagmar Gromann, Michael Wiegand
Can LLMs Clarify? Investigation and Enhancement of Large Language Models on Argument Claim Optimization
Yiran Wang, Ben He, Xuanang Chen et al.
AraDiCE: Benchmarks for Dialectal and Cultural Capabilities in LLMs
Basel Mousi, Nadir Durrani, Fatema Ahmad et al.
How Credible Is an Answer From Retrieval-Augmented LLMs? Investigation and Evaluation With Multi-Hop QA
Yujia Zhou, Zheng Liu, Zhicheng Dou
Is Parameter Collision Hindering Continual Learning in LLMs?
Shuo Yang, Kun-Peng Ning, Yu-Yang Liu et al.
Large Language Models are good multi-lingual learners : When LLMs meet cross-lingual prompts
Teng Wang, Zhenqi He, Wing-Yin Yu et al.
QUENCH: Measuring the gap between Indic and Non-Indic Contextual General Reasoning in LLMs
Mohammad Aflah Khan, Neemesh Yadav, Sarah Masud et al.
What’s the most important value? INVP: INvestigating the Value Priorities of LLMs through Decision-making in Social Scenarios
Xuelin Liu, Pengyuan Liu, Dong Yu
BasqBBQ: A QA Benchmark for Assessing Social Biases in LLMs for Basque, a Low-Resource Language
Muitze Zulaika, Xabier Saralegi
Interactive Evaluation for Medical LLMs via Task-oriented Dialogue System
Ruoyu Liu, Kui Xue, Xiaofan Zhang et al.
Extracting structure from an LLM - how to improve on surprisal-based models of Human Language Processing
Daphne P. Wang, Mehrnoosh Sadrzadeh, Miloš Stanojević et al.
What Makes Cryptic Crosswords Challenging for LLMs?
Abdelrahman Sadallah, Daria Kotova, Ekaterina Kochmar