Papers
2,781 papers found
Reasoning or Knowledge: Stratified Evaluation of Biomedical LLMs
Rahul Thapa, Qingyang Wu, Kevin Wu et al.
AfriVox: Probing Multilingual and Accent Robustness of Speech LLMs
Busayo Awobade, Mardhiyah Sanni, Tassallah Abdullahi et al.
PTEB: Towards Robust Text Embedding Evaluation via Stochastic Paraphrasing at Evaluation Time with LLMs
Manuel Frank, Haithem Afli
How Good Are LLMs at Processing Tool Outputs?
Kiran Kate, Yara Rizk, Poulami Ghosh et al.
Tug-of-war between idioms’ figurative and literal interpretations in LLMs
Soyoung Oh, Xinting Huang, Mathis Pink et al.
MALicious INTent Dataset and Inoculating LLMs for Enhanced Disinformation Detection
Arkadiusz Modzelewski, Witold Sosnowski, Eleni Papadopulos et al.
When Can We Trust LLMs in Mental Health? Large-Scale Benchmarks for Reliable LLM Evaluation
Abeer Badawi, Elahe Rahimi, Md Tahmid Rahman Laskar et al.
Word Surprisal Correlates with Sentential Contradiction in LLMs
Ning Shi, Bradley Hauer, David Basil et al.
Where Do LLMs Compose Meaning? A Layerwise Analysis of Compositional Robustness
Nura Aljaafari, Danilo Carvalho, Andre Freitas
Unlocking Latent Discourse Translation in LLMs Through Quality-Aware Decoding
Wafaa Mohammed, Vlad Niculae, Chrysoula Zerva
Calibrating Beyond English: Language Diversity for Better Quantized Multilingual LLMs
Everlyn Asiko Chimoto, Mostafa Elhoushi, Bruce Bassett
Can you map it to English? The Role of Cross-Lingual Alignment in the Multilingual Performance of LLMs
Kartik Ravisankar, HyoJung Han, Sarah Wiegreffe et al.
DITTO: A Spoofing Attack Framework on Watermarked LLMs via Knowledge Distillation
Hyeseon An, Shinwoo Park, Suyeon Woo et al.
From Delegates to Trustees: How Optimizing for Long-Term Interests Shapes Bias and Alignment in LLMs
Suyash Fulay, Jocelyn Zhu, Michiel A. Bakker
Biasless Language Models Learn Unnaturally: How LLMs Fail to Distinguish the Possible from the Impossible
Imry Ziv, Nur Lan, Emmanuel Chemla
Multilingual Amnesia: On the Transferability of Unlearning in Multilingual LLMs
Alireza Dehghanpour Farashah, Aditi Khandelwal, Marylou Fauchard et al.
Beyond Math: Stories as a Testbed for Memorization-Constrained Reasoning in LLMs
Yuxuan Jiang, Francis Ferraro
Neural Breadcrumbs: Membership Inference Attacks on LLMs Through Hidden State and Attention Pattern Analysis
Disha Makhija, Manoj Ghuhan Arivazhagan, Vinayshekhar Bannihatti Kumar et al.
Tracking the Limits of Knowledge Propagation: How LLMs Fail at Multi-Step Reasoning with Conflicting Knowledge
Yiyang Feng, Zeming Chen, Haotian Wu et al.
Do Audio LLMs Really LISTEN, or Just Transcribe? Measuring Lexical vs. Acoustic Emotion Cues Reliance
Jingyi Chen, Zhimeng Guo, Jiyun Chun et al.
Mary, the Cheeseburger-Eating Vegetarian: Do LLMs Recognize Incoherence in Narratives?
Karin De Langis, Püren Öncel, Ryan Peters et al.
Strong Memory, Weak Control: An Empirical Study of Executive Functioning in LLMs
Karin de Langis, Jong Inn Park, Bin Hu et al.
Can LLMs reason over extended multilingual contexts? Towards long-context evaluation beyond retrieval over haystacks
Amey Hengle, Prasoon Bajpai, Soham Dan et al.
Knowing When to Abstain: Medical LLMs Under Clinical Uncertainty
Sravanthi Machcha, Sushrita Yerra, Sahil Gupta et al.