Papers
LLMs as Narcissistic Evaluators: When Ego Inflates Evaluation Scores
Yiqi Liu, Nafise Moosavi, Chenghua Lin
HelloFresh: LLM Evalutions on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia edits
Tim Franzmeyer, Aleksandar Shtedritski, Samuel Albanie et al.
The Art of Defending: A Systematic Evaluation and Analysis of LLM Defense Strategies on Safety and Over-Defensiveness
Neeraj Varshney, Pavel Dolin, Agastya Seth et al.
Leveraging LLM Reasoning Enhances Personalized Recommender Systems
Alicia Tsai, Adam Kraft, Long Jin et al.
CHAMP: A Competition-level Dataset for Fine-Grained Analyses of LLMs’ Mathematical Reasoning Capabilities
Yujun Mao, Yoon Kim, Yilun Zhou
Proving membership in LLM pretraining data via data watermarks
Johnny Wei, Ryan Wang, Robin Jia
RaDA: Retrieval-augmented Web Agent Planning with LLMs
Minsoo Kim, Victor Bursztyn, Eunyee Koh et al.
Competition-Level Problems are Effective LLM Evaluators
Yiming Huang, Zhenghao Lin, Xiao Liu et al.
Code Needs Comments: Enhancing Code LLMs with Comment Augmentation
Demin Song, Honglin Guo, Yunhua Zhou et al.
CR-LLM: A Dataset and Optimization for Concept Reasoning of Large Language Models
Nianqi Li, Jingping Liu, Sihang Jiang et al.
LLMs cannot find reasoning errors, but can correct them given the error location
Gladys Tyen, Hassan Mansoor, Victor Carbune et al.
Fantastic Semantics and Where to Find Them: Investigating Which Layers of Generative LLMs Reflect Lexical Semantics
Zhu Liu, Cunliang Kong, Ying Liu et al.
Debatrix: Multi-dimensional Debate Judge with Iterative Chronological Analysis Based on LLM
Jingcong Liang, Rong Ye, Meng Han et al.
CycleAlign: Iterative Distillation from Black-box LLM to White-box Models for Better Human Alignment
Jixiang Hong, Quan Tu, Changyu Chen et al.
Combining Hierachical VAEs with LLMs for clinically meaningful timeline summarisation in social media
Jiayu Song, Jenny Chim, Adam Tsakalidis et al.
S3-DST: Structured Open-Domain Dialogue Segmentation and State Tracking in the Era of LLMs
Sarkar Snigdha Sarathi Das, Chirag Shah, Mengting Wan et al.
Knowledge-Infused Legal Wisdom: Navigating LLM Consultation through the Lens of Diagnostics and Positive-Unlabeled Reinforcement Learning
Yang Wu, Chenghao Wang, Ece Gumusel et al.
Hire a Linguist!: Learning Endangered Languages in LLMs with In-Context Linguistic Descriptions
Kexun Zhang, Yee Choi, Zhenqiao Song et al.
From Tarzan to Tolkien: Controlling the Language Proficiency Level of LLMs for Content Generation
Ali Malik, Stephen Mayhew, Christopher Piech et al.
A Critical Study of What Code-LLMs (Do Not) Learn
Abhinav Anand, Shweta Verma, Krishna Narasimhan et al.
Defending LLMs against Jailbreaking Attacks via Backtranslation
Yihan Wang, Zhouxing Shi, Andrew Bai et al.
Ask LLMs Directly, “What shapes your bias?”: Measuring Social Bias in Large Language Models
Jisu Shin, Hoyun Song, Huije Lee et al.
Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
Ming Li, Lichang Chen, Jiuhai Chen et al.
Selective Prompting Tuning for Personalized Conversations with LLMs
Qiushi Huang, Xubo Liu, Tom Ko et al.