Papers
Curse of Knowledge: Your Guidance and Provided Knowledge are biasing LLM Judges in Complex Evaluation
Weiyuan Li, Xintao Wang, Siyu Yuan et al.
Neutral Is Not Unbiased: Evaluating Implicit and Intersectional Identity Bias in LLMs Through Structured Narrative Scenarios
Saba Ghanbari Haez, Mauro Dragoni
Can LLMs Be Efficient Predictors of Conversational Derailment?
Kaustubh Olpadkar, Vikram Sunil Bajaj, Leslie Barrett
Factuality Beyond Coherence: Evaluating LLM Watermarking Methods for Medical Texts
Rochana Prih Hastuti, Rian Adam Rajagede, Mansour Al Ghanim et al.
Dropping Experts, Recombining Neurons: Retraining-Free Pruning for Sparse Mixture-of-Experts LLMs
Yixiao Zhou, Ziyu Zhao, Dongzhou Cheng et al.
LLMs Can Compensate for Deficiencies in Visual Representations
Sho Takishita, Jay Gala, Abdelrahman Mohamed et al.
Exploring Paraphrasing Strategies for CEFR A1-Level Constraints in LLMs
Eugenio Marzona, Maria Goikhman, Alessio Palmero Aprosio et al.
Efficient Layer-wise LLM Fine-tuning for Revision Intention Prediction
Zhexiong Liu, Diane Litman
ConText-LE: Cross-Distribution Generalization for Longitudinal Experiential Data via Narrative-Based LLM Representations
Ahatsham Hayat, Bilal Khan, Mohammad Rashedul Hasan
ULTRABENCH: Benchmarking LLMs under Extreme Fine-grained Text Generation
Longfei Yun, Letian Peng, Jingbo Shang
The Price of Format: Diversity Collapse in LLMs
Longfei Yun, Chenyang An, Zilong Wang et al.
LLMs for Bayesian Optimization in Scientific Domains: Are We There Yet?
Rushil Gupta, Jason Hartford, Bang Liu
Can Multiple Responses from an LLM Reveal the Sources of Its Uncertainty?
Yang Nan, Pengfei He, Ravi Tandon et al.
Not Lost After All: How Cross-Encoder Attribution Challenges Position Bias Assumptions in LLM Summarization
Elahe Rahimi, Hassan Sajjad, Domenic Rosati et al.
MFTCXplain: A Multilingual Benchmark Dataset for Evaluating the Moral Reasoning of LLMs through Multi-hop Hate Speech Explanation
Jackson Trager, Francielle Vargas, Diego Alves et al.
Fine-tuning LLMs with Cross-Attention-based Weight Decay for Bias Mitigation
Farsheed Haque, Zhe Fu, Depeng Xu et al.
Profiling LLM’s Copyright Infringement Risks under Adversarial Persuasive Prompting
Jikai Long, Ming Liu, Xiusi Chen et al.
Post-hoc Study of Climate Microtargeting on Social Media Ads with LLMs: Thematic Insights and Fairness Evaluation
Tunazzina Islam, Dan Goldwasser
HetGCoT: Heterogeneous Graph-Enhanced Chain-of-Thought LLM Reasoning for Academic Question Answering
Runsong Jia, Mengjia Wu, Ying Ding et al.
FSTs vs ICL: Generalisation in LLMs for an under-resourced language
Ximena Gutierrez, Mikel Segura Elizalde, Victor Mijangos
Benchmarking and Improving LLM Robustness for Personalized Generation
Chimaobi Okite, Naihao Deng, Kiran Bodipati et al.
Hallucination Detection in Structured Query Generation via LLM Self-Debating
Miaoran Li, Jiangning Chen, Minghua Xu et al.
Bridging the Creativity Understanding Gap: Small-Scale Human Alignment Enables Expert-Level Humor Ranking in LLMs
Kuan Lok Zhou, Jiayi Chen, Siddharth Suresh et al.
DrKGC: Dynamic Subgraph Retrieval-Augmented LLMs for Knowledge Graph Completion across General and Biomedical Domains
Yongkang Xiao, Sinian Zhang, Yi Dai et al.
When Instructions Multiply: Measuring and Estimating LLM Capabilities of Multiple Instructions Following
Keno Harada, Yudai Yamazaki, Masachika Taniguchi et al.