Papers
LingGym: How Far Are LLMs from Thinking Like Field Linguists?
Changbing Yang, Franklin Ma, Freda Shi et al.
Personality Matters: User Traits Predict LLM Preferences in Multi-Turn Collaborative Tasks
Sarfaroz Yunusov, Kaige Chen, Kazi Nishat Anwar et al.
Enhancing Efficiency and Exploration in Reinforcement Learning for LLMs
Mengqi Liao, Xiangyu Xi, Chen Ruinian et al.
LLM Bias Detection and Mitigation through the Lens of Desired Distributions
Ingroj Shrestha, Padmini Srinivasan
Persuasion Dynamics in LLMs: Investigating Robustness and Adaptability in Knowledge and Safety with DuET-PD
Bryan Chen Zhengyu Tan, Daniel Wai Kit Chin, Zhengyuan Liu et al.
CoBia: Constructed Conversations Can Trigger Otherwise Concealed Societal Biases in LLMs
Nafiseh Nikeghbal, Amir Hossein Kargaran, Jana Diesner
Beyond the Surface: Measuring Self-Preference in LLM Judgments
Zhi-Yuan Chen, Hao Wang, Xinyu Zhang et al.
Utility-Focused LLM Annotation for Retrieval and Retrieval-Augmented Generation
Hengran Zhang, Minghao Tang, Keping Bi et al.
Autoformalization in the Wild: Assessing LLMs on Real-World Mathematical Definitions
Lan Zhang, Marco Valentino, Andre Freitas
We Politely Insist: Your LLM Must Learn the Persian Art of Taarof
Nikta Gohari Sadr, Sahar Heidariasl, Karine Megerdoomian et al.
Foot-In-The-Door: A Multi-turn Jailbreak for LLMs
Zixuan Weng, Xiaolong Jin, Jinyuan Jia et al.
F²Bench: An Open-ended Fairness Evaluation Benchmark for LLMs with Factuality Considerations
Tian Lan, Jiang Li, Yemin Wang et al.
CodeMixBench: Evaluating Code-Mixing Capabilities of LLMs Across 18 Languages
Yilun Yang, Yekun Chai
Unveiling Internal Reasoning Modes in LLMs: A Deep Dive into Latent Reasoning vs. Factual Shortcuts with Attribute Rate Ratio
Yiran Yang, Haifeng Sun, Jingyu Wang et al.
LLMs Behind the Scenes: Enabling Narrative Scene Illustration
Melissa Roemmele, John Joon Young Chung, Taewook Kim et al.
FilBench: Can LLMs Understand and Generate Filipino?
Lester James Validad Miranda, Elyanah Aco, Conner G. Manuel et al.
Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs
Dayu Yang, Tianyang Liu, Daoan Zhang et al.
User Feedback in Human-LLM Dialogues: A Lens to Understand Users But Noisy as a Learning Signal
Yuhan Liu, Michael JQ Zhang, Eunsol Choi
Read to Hear: A Zero-Shot Pronunciation Assessment Using Textual Descriptions and LLMs
Yu-Wen Chen, Melody Ma, Julia Hirschberg
Ask Patients with Patience: Enabling LLMs for Human-Centric Medical Dialogue with Grounded Reasoning
Jiayuan Zhu, Jiazhen Pan, Yuyuan Liu et al.
Unleashing the Reasoning Potential of LLMs by Critique Fine-Tuning on One Problem
Yubo Wang, Ping Nie, Kai Zou et al.
SAND: Boosting LLM Agents with Self-Taught Action Deliberation
Yu Xia, Yiran Jenny Shen, Junda Wu et al.
LLMs as World Models: Data-Driven and Human-Centered Pre-Event Simulation for Disaster Impact Assessment
Lingyao Li, Dawei Li, Zhenhui Ou et al.
Mind the Value-Action Gap: Do LLMs Act in Alignment with Their Values?
Hua Shen, Nicholas Clark, Tanu Mitra
FANS: Formal Answer Selection for LLM Natural Language Math Reasoning Using Lean4
Jiarui Yao, Ruida Wang, Tong Zhang