Papers
5,479 papers found
Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling
Yiwen Ding, Zhiheng Xi, Wei He et al.
The LLM Language Network: A Neuroscientific Approach for Identifying Causally Task-Relevant Units
Badr AlKhamissi, Greta Tuckute, Antoine Bosselut et al.
Lived Experience Not Found: LLMs Struggle to Align with Experts on Addressing Adverse Drug Reactions from Psychiatric Medication Use
Mohit Chandra, Siddharth Sriraman, Gaurav Verma et al.
The Stochastic Parrot on LLM’s Shoulder: A Summative Assessment of Physical Concept Understanding
Mo Yu, Lemao Liu, Junjie Wu et al.
Are Multimodal LLMs Robust Against Adversarial Perturbations? RoMMath: A Systematic Evaluation on Multimodal Math Reasoning
Yilun Zhao, Guo Gan, Chengye Wang et al.
ALinFiK: Learning to Approximate Linearized Future Influence Kernel for Scalable Third-Party LLM Data Valuation
Yanzhou Pan, Huawei Lin, Yide Ran et al.
AutoParLLM: GNN-guided Context Generation for Zero-Shot Code Parallelization using LLMs
Quazi Ishtiaque Mahmud, Ali TehraniJamsaz, Hung D Phan et al.
AI-LieDar : Examine the Trade-off Between Utility and Truthfulness in LLM Agents
Zhe Su, Xuhui Zhou, Sanketh Rangreji et al.
Few-shot Personalization of LLMs with Mis-aligned Responses
Jaehyung Kim, Yiming Yang
Prompting with Phonemes: Enhancing LLMs’ Multilinguality for Non-Latin Script Languages
Hoang H Nguyen, Khyati Mahajan, Vikas Yadav et al.
JAWAHER: A Multidialectal Dataset of Arabic Proverbs for LLM Benchmarking
Samar Mohamed Magdy, Sang Yun Kwon, Fakhraddin Alwajih et al.
EmojiPrompt: Generative Prompt Obfuscation for Privacy-Preserving Communication with Cloud-based LLMs
Sam Lin, Wenyue Hua, Zhenting Wang et al.
Pipeline Analysis for Developing Instruct LLMs in Low-Resource Languages: A Case Study on Basque
Ander Corral, Ixak Sarasua Antero, Xabier Saralegi
How to Make LLMs Forget: On Reversing In-Context Knowledge Edits
Paul Youssef, Zhixue Zhao, Jörg Schlötterer et al.
PerCul: A Story-Driven Cultural Evaluation of LLMs in Persian
Erfan Moosavi Monazzah, Vahid Rahimzadeh, Yadollah Yaghoobzadeh et al.
CSR-Bench: Benchmarking LLM Agents in Deployment of Computer Science Research Repositories
Yijia Xiao, Runhui Wang, Luyang Kong et al.
Complete Chess Games Enable LLM Become A Chess Master
Yinqi Zhang, Xintian Han, Haolong Li et al.
Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can’t Answer?
Nishant Balepur, Feng Gu, Abhilasha Ravichander et al.
Automatic Evaluation of Healthcare LLMs Beyond Question-Answering
Anna Arias-Duart, Pablo Agustin Martin-Torres, Daniel Hinjos et al.
STRUX: An LLM for Decision-Making with Structured Explanations
Yiming Lu, Yebowen Hu, Hassan Foroosh et al.
LLM2: Let Large Language Models Harness System 2 Reasoning
Cheng Yang, Chufan Shi, Siheng Li et al.
Using Contextually Aligned Online Reviews to Measure LLMs’ Performance Disparities Across Language Varieties
Zixin Tang, Chieh-Yang Huang, Tsung-che Li et al.
A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference
You Wu, Haoyi Wu, Kewei Tu
FaithBench: A Diverse Hallucination Benchmark for Summarization by Modern LLMs
Forrest Sheng Bao, Miaoran Li, Renyi Qu et al.
Explore the Reasoning Capability of LLMs in the Chess Testbed
Shu Wang, Lei Ji, Renxi Wang et al.