Papers
F²Bench: An Open-ended Fairness Evaluation Benchmark for LLMs with Factuality Considerations
Tian Lan, Jiang Li, Yemin Wang et al.
CodeMixBench: Evaluating Code-Mixing Capabilities of LLMs Across 18 Languages
Yilun Yang, Yekun Chai
Unveiling Internal Reasoning Modes in LLMs: A Deep Dive into Latent Reasoning vs. Factual Shortcuts with Attribute Rate Ratio
Yiran Yang, Haifeng Sun, Jingyu Wang et al.
LLMs Behind the Scenes: Enabling Narrative Scene Illustration
Melissa Roemmele, John Joon Young Chung, Taewook Kim et al.
FilBench: Can LLMs Understand and Generate Filipino?
Lester James Validad Miranda, Elyanah Aco, Conner G. Manuel et al.
Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs
Dayu Yang, Tianyang Liu, Daoan Zhang et al.
User Feedback in Human-LLM Dialogues: A Lens to Understand Users But Noisy as a Learning Signal
Yuhan Liu, Michael JQ Zhang, Eunsol Choi
Read to Hear: A Zero-Shot Pronunciation Assessment Using Textual Descriptions and LLMs
Yu-Wen Chen, Melody Ma, Julia Hirschberg
Ask Patients with Patience: Enabling LLMs for Human-Centric Medical Dialogue with Grounded Reasoning
Jiayuan Zhu, Jiazhen Pan, Yuyuan Liu et al.
Unleashing the Reasoning Potential of LLMs by Critique Fine-Tuning on One Problem
Yubo Wang, Ping Nie, Kai Zou et al.
SAND: Boosting LLM Agents with Self-Taught Action Deliberation
Yu Xia, Yiran Jenny Shen, Junda Wu et al.
LLMs as World Models: Data-Driven and Human-Centered Pre-Event Simulation for Disaster Impact Assessment
Lingyao Li, Dawei Li, Zhenhui Ou et al.
Mind the Value-Action Gap: Do LLMs Act in Alignment with Their Values?
Hua Shen, Nicholas Clark, Tanu Mitra
FANS: Formal Answer Selection for LLM Natural Language Math Reasoning Using Lean4
Jiarui Yao, Ruida Wang, Tong Zhang
Humanizing Machines: Rethinking LLM Anthropomorphism Through a Multi-Level Framework of Design
Yunze Xiao, Lynnette Hui Xian Ng, Jiarui Liu et al.
TokenSkip: Controllable Chain-of-Thought Compression in LLMs
Heming Xia, Chak Tou Leong, Wenjie Wang et al.
Why Do Some Inputs Break Low-Bit LLM Quantization?
Ting-Yun Chang, Muru Zhang, Jesse Thomason et al.
Exploring Changes in Nation Perception with Nationality-Assigned Personas in LLMs
Mahammed Kamruzzaman, Gene Louis Kim
RAG-Instruct: Boosting LLMs with Diverse Retrieval-Augmented Instructions
Wanlong Liu, Junying Chen, Ke Ji et al.
SmartBench: Is Your LLM Truly a Good Chinese Smartphone Assistant?
Xudong Lu, Haohao Gao, Renshou Wu et al.
Multimedia Event Extraction with LLM Knowledge Editing
Jiaao Yu, Yijing Lin, Zhipeng Gao et al.
Exploring the Impact of Personality Traits on LLM Bias and Toxicity
Shuo Wang, Renhao Li, Xi Chen et al.
BannerAgency: Advertising Banner Design with Multimodal LLM Agents
Heng Wang, Yotaro Shimose, Shingo Takamatsu
Training LLMs to be Better Text Embedders through Bidirectional Reconstruction
Chang Su, Dengliang Shi, Siyuan Huang et al.
CoLA: Compute-Efficient Pre-Training of LLMs via Low-Rank Activation
Ziyue Liu, Ruijie Zhang, Zhengyang Wang et al.