Papers
2,781 papers found
Reading between the Lines: Can LLMs Identify Cross-Cultural Communication Gaps?
Sougata Saha, Saurabh Kumar Pandey, Harshit Gupta et al.
Understanding LLMs’ Fluid Intelligence Deficiency: An Analysis of the ARC Task
Junjie Wu, Mo Yu, Lemao Liu et al.
Substance Beats Style: Why Beginning Students Fail to Code with LLMs
Francesca Lucchetti, Zixuan Wu, Arjun Guha et al.
Reverse Thinking Makes LLMs Stronger Reasoners
Justin Chen, Zifeng Wang, Hamid Palangi et al.
SLM-Mod: Small Language Models Surpass LLMs at Content Moderation
Xianyang Zhan, Agam Goyal, Yilun Chen et al.
Dynamic Uncertainty Ranking: Enhancing Retrieval-Augmented In-Context Learning for Long-Tail Knowledge in LLMs
Shuyang Yu, Runxue Bao, Parminder Bhatia et al.
Is a Peeled Apple Still Red? Evaluating LLMs’ Ability for Conceptual Combination with Property Type
Seokwon Song, Taehyun Lee, Jaewoo Ahn et al.
Does Mapo Tofu Contain Coffee? Probing LLMs for Food-related Cultural Knowledge
Li Zhou, Taelin Karidi, Wanlong Liu et al.
LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMs
Sumin An, Junyoung Sung, Wonpyo Park et al.
LLMs vs Established Text Augmentation Techniques for Classification: When do the Benefits Outweight the Costs?
Jan Cegin, Jakub Simko, Peter Brusilovsky
Self-Training Meets Consistency: Improving LLMs’ Reasoning with Consistency-Driven Rationale Evaluation
Jaehyeok Lee, Keisuke Sakaguchi, JinYeong Bak
Lived Experience Not Found: LLMs Struggle to Align with Experts on Addressing Adverse Drug Reactions from Psychiatric Medication Use
Mohit Chandra, Siddharth Sriraman, Gaurav Verma et al.
Are Multimodal LLMs Robust Against Adversarial Perturbations? RoMMath: A Systematic Evaluation on Multimodal Math Reasoning
Yilun Zhao, Guo Gan, Chengye Wang et al.
AutoParLLM: GNN-guided Context Generation for Zero-Shot Code Parallelization using LLMs
Quazi Ishtiaque Mahmud, Ali TehraniJamsaz, Hung D Phan et al.
Few-shot Personalization of LLMs with Mis-aligned Responses
Jaehyung Kim, Yiming Yang
Prompting with Phonemes: Enhancing LLMs’ Multilinguality for Non-Latin Script Languages
Hoang H Nguyen, Khyati Mahajan, Vikas Yadav et al.
EmojiPrompt: Generative Prompt Obfuscation for Privacy-Preserving Communication with Cloud-based LLMs
Sam Lin, Wenyue Hua, Zhenting Wang et al.
Pipeline Analysis for Developing Instruct LLMs in Low-Resource Languages: A Case Study on Basque
Ander Corral, Ixak Sarasua Antero, Xabier Saralegi
How to Make LLMs Forget: On Reversing In-Context Knowledge Edits
Paul Youssef, Zhixue Zhao, Jörg Schlötterer et al.
PerCul: A Story-Driven Cultural Evaluation of LLMs in Persian
Erfan Moosavi Monazzah, Vahid Rahimzadeh, Yadollah Yaghoobzadeh et al.
Automatic Evaluation of Healthcare LLMs Beyond Question-Answering
Anna Arias-Duart, Pablo Agustin Martin-Torres, Daniel Hinjos et al.
Using Contextually Aligned Online Reviews to Measure LLMs’ Performance Disparities Across Language Varieties
Zixin Tang, Chieh-Yang Huang, Tsung-che Li et al.
FaithBench: A Diverse Hallucination Benchmark for Summarization by Modern LLMs
Forrest Sheng Bao, Miaoran Li, Renyi Qu et al.
Explore the Reasoning Capability of LLMs in the Chess Testbed
Shu Wang, Lei Ji, Renxi Wang et al.
Auto-Cypher: Improving LLMs on Cypher generation via LLM-supervised generation-verification framework
Aman Tiwari, Shiva Krishna Reddy Malay, Vikas Yadav et al.