Papers
Persona-Augmented Benchmarking: Evaluating LLMs Across Diverse Writing Styles
Kimberly Truong, Riccardo Fogliato, Hoda Heidari et al.
Job Unfair: An Investigation of Gender and Occupational Bias in Free-Form Text Completions by LLMs
Camilla Casula, Sebastiano Vecellio Salto, Elisa Leonardelli et al.
Understanding LLMs’ Cross-Lingual Context Retrieval: How Good It Is And Where It Comes From
Changjiang Gao, Hankun Lin, Xin Huang et al.
Exploring the Hidden Capacity of LLMs for One-Step Text Generation
Gleb Mezentsev, Ivan Oseledets
DCR: Quantifying Data Contamination in LLMs Evaluation
Cheng Xu, Nan Yan, Shuhao Guan et al.
Building Trust in Clinical LLMs: Bias Analysis and Dataset Transparency
Svetlana Maslenkova, Clement Christophe, Marco AF Pimentel et al.
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth
Yang Wang, Chenghao Xiao, Chia-Yi Hsiao et al.
InterIDEAS: Philosophical Intertextuality via LLMs
Yue Yang, Yinzhi Xu, Chenghao Huang et al.
GER-LLM: Efficient and Effective Geospatial Entity Resolution with Large Language Model
Haojia Zhu, Zhicheng Li, Jiahui Jin
RAcQUEt: Unveiling the Dangers of Overlooked Referential Ambiguity in Visual LLMs
Alberto Testoni, Barbara Plank, Raquel Fernández
Rethinking Text-based Protein Understanding: Retrieval or LLM?
Juntong Wu, Zijing Liu, He Cao et al.
Easy as PIE? Identifying Multi-Word Expressions with LLMs
Kai Golan Hashiloni, Ofri Hefetz, Kfir Bar
Graph-R1: Incentivizing the Zero-Shot Graph Learning Capability in LLMs via Explicit Reasoning
Yicong Wu, Guangyue Lu, Yuan Zuo et al.
Scalable and Culturally Specific Stereotype Dataset Construction via Human-LLM Collaboration
Weicheng Ma, John J. Guerrerio, Soroush Vosoughi
LLMs Don’t Know Their Own Decision Boundaries: The Unreliability of Self-Generated Counterfactual Explanations
Harry Mayne, Ryan Othniel Kearns, Yushi Yang et al.
Grounding Multilingual Multimodal LLMs With Cultural Knowledge
Jean De Dieu Nyandwi, Yueqi Song, Simran Khanuja et al.
NEXUS: Network Exploration for eXploiting Unsafe Sequences in Multi-Turn LLM Jailbreaks
Javad Rafiei Asl, Sidhant Narula, Mohammad Ghasemigol et al.
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
Simon A. Aytes, Jinheon Baek, Sung Ju Hwang
From Language to Cognition: How LLMs Outgrow the Human Language Network
Badr AlKhamissi, Greta Tuckute, Yingtian Tang et al.
Hallucination Detection in LLMs Using Spectral Features of Attention Maps
Jakub Binkowski, Denis Janiak, Albert Sawczyn et al.
Towards a Holistic and Automated Evaluation Framework for Multi-Level Comprehension of LLMs in Book-Length Contexts
Yuho Lee, Jiaqi Deng, Nicole Hee-Yeon Kim et al.
Evaluation and Facilitation of Online Discussions in the LLM Era: A Survey
Katerina Korre, Dimitris Tsirmpas, Nikos Gkoumas et al.
From Word to World: Evaluate and Mitigate Culture Bias in LLMs via Word Association Test
Xunlian Dai, Li Zhou, Benyou Wang et al.
AdaSteer: Your Aligned LLM is Inherently an Adaptive Jailbreak Defender
Weixiang Zhao, Jiahe Guo, Yulin Hu et al.
TFDP: Token-Efficient Disparity Audits for Autoregressive LLMs via Single-Token Masked Evaluation
Inderjeet Singh, Ramya Srinivasan, Roman Vainshtein et al.