Papers
Analyze, Generate and Refine: Query Expansion with LLMs for Zero-Shot Open-Domain QA
Xinran Chen, Xuanang Chen, Ben He et al.
LC4EE: LLMs as Good Corrector for Event Extraction
Mengna Zhu, Kaisheng Zeng, JibingWu JibingWu et al.
PUB: A Pragmatics Understanding Benchmark for Assessing LLMs’ Pragmatics Capabilities
Settaluri Sravanthi, Meet Doshi, Pavan Tankala et al.
Probing the Emergence of Cross-lingual Alignment during LLM Training
Hetong Wang, Pasquale Minervini, Edoardo Ponti
MM-LLMs: Recent Advances in MultiModal Large Language Models
Duzhen Zhang, Yahan Yu, Jiahua Dong et al.
Countering Reward Over-Optimization in LLM with Demonstration-Guided Reinforcement Learning
Mathieu Rita, Florian Strub, Rahma Chaabouni et al.
LLMs as Narcissistic Evaluators: When Ego Inflates Evaluation Scores
Yiqi Liu, Nafise Moosavi, Chenghua Lin
HelloFresh: LLM Evalutions on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia edits
Tim Franzmeyer, Aleksandar Shtedritski, Samuel Albanie et al.
The Art of Defending: A Systematic Evaluation and Analysis of LLM Defense Strategies on Safety and Over-Defensiveness
Neeraj Varshney, Pavel Dolin, Agastya Seth et al.
CHAMP: A Competition-level Dataset for Fine-Grained Analyses of LLMs’ Mathematical Reasoning Capabilities
Yujun Mao, Yoon Kim, Yilun Zhou
Proving membership in LLM pretraining data via data watermarks
Johnny Wei, Ryan Wang, Robin Jia
RaDA: Retrieval-augmented Web Agent Planning with LLMs
Minsoo Kim, Victor Bursztyn, Eunyee Koh et al.
Competition-Level Problems are Effective LLM Evaluators
Yiming Huang, Zhenghao Lin, Xiao Liu et al.
Code Needs Comments: Enhancing Code LLMs with Comment Augmentation
Demin Song, Honglin Guo, Yunhua Zhou et al.
CR-LLM: A Dataset and Optimization for Concept Reasoning of Large Language Models
Nianqi Li, Jingping Liu, Sihang Jiang et al.
LLMs cannot find reasoning errors, but can correct them given the error location
Gladys Tyen, Hassan Mansoor, Victor Carbune et al.
Fantastic Semantics and Where to Find Them: Investigating Which Layers of Generative LLMs Reflect Lexical Semantics
Zhu Liu, Cunliang Kong, Ying Liu et al.
Debatrix: Multi-dimensional Debate Judge with Iterative Chronological Analysis Based on LLM
Jingcong Liang, Rong Ye, Meng Han et al.
CycleAlign: Iterative Distillation from Black-box LLM to White-box Models for Better Human Alignment
Jixiang Hong, Quan Tu, Changyu Chen et al.
Combining Hierachical VAEs with LLMs for clinically meaningful timeline summarisation in social media
Jiayu Song, Jenny Chim, Adam Tsakalidis et al.
S3-DST: Structured Open-Domain Dialogue Segmentation and State Tracking in the Era of LLMs
Sarkar Snigdha Sarathi Das, Chirag Shah, Mengting Wan et al.
Knowledge-Infused Legal Wisdom: Navigating LLM Consultation through the Lens of Diagnostics and Positive-Unlabeled Reinforcement Learning
Yang Wu, Chenghao Wang, Ece Gumusel et al.
Hire a Linguist!: Learning Endangered Languages in LLMs with In-Context Linguistic Descriptions
Kexun Zhang, Yee Choi, Zhenqiao Song et al.
From Tarzan to Tolkien: Controlling the Language Proficiency Level of LLMs for Content Generation
Ali Malik, Stephen Mayhew, Christopher Piech et al.