Papers
PlanGenLLMs: A Modern Survey of LLM Planning Capabilities
Hui Wei, Zihao Zhang, Shenghua He et al.
IAM: Efficient Inference through Attention Mapping between Different-scale LLMs
Yi Zhao, Zuchao Li, Hai Zhao
From Tools to Teammates: Evaluating LLMs in Multi-Session Coding Interactions
Nathanaël Carraz Rakotonirina, Mohammed Hamdy, Jon Ander Campos et al.
Guiding not Forcing: Enhancing the Transferability of Jailbreaking Attacks on LLMs via Removing Superfluous Constraints
Junxiao Yang, Zhexin Zhang, Shiyao Cui et al.
Safer or Luckier? LLMs as Safety Evaluators Are Not Robust to Artifacts
Hongyu Chen, Seraphina Goldfarb-Tarrant
Vulnerability of LLMs to Vertically Aligned Text Manipulations
Zhecheng Li, Yiwei Wang, Bryan Hooi et al.
AutoMixAlign: Adaptive Data Mixing for Multi-Task Preference Optimization in LLMs
Nicholas E. Corrado, Julian Katz-Samuels, Adithya M Devraj et al.
Understanding Silent Data Corruption in LLM Training
Jeffrey Jian Ma, Hengzhi Pei, Leonard Lausen et al.
BIG5-CHAT: Shaping LLM Personalities Through Training on Human-Grounded Data
Wenkai Li, Jiarui Liu, Andy Liu et al.
Amplifying Trans and Nonbinary Voices: A Community-Centred Harm Taxonomy for LLMs
Eddie L. Ungless, Sunipa Dev, Cynthia L. Bennett et al.
Can LLMs Identify Critical Limitations within Scientific Research? A Systematic Evaluation on AI Research Papers
Zhijian Xu, Yilun Zhao, Manasi Patwardhan et al.
Navigating Rifts in Human-LLM Grounding: Study and Benchmark
Omar Shaikh, Hussein Mozannar, Gagan Bansal et al.
Structural Reasoning Improves Molecular Understanding of LLM
Yunhui Jang, Jaehyung Kim, Sungsoo Ahn
VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos
Tingyu Song, Tongyan Hu, Guo Gan et al.
On Generalization across Measurement Systems: LLMs Entail More Test-Time Compute for Underrepresented Cultures
Minh Duc Bui, Kyung Eun Park, Goran Glavaš et al.
Veracity Bias and Beyond: Uncovering LLMs’ Hidden Beliefs in Problem-Solving Reasoning
Yue Zhou, Barbara Di Eugenio
LLM Meets Scene Graph: Can Large Language Models Understand and Generate Scene Graphs? A Benchmark and Empirical Study
Dongil Yang, Minjin Kim, Sunghwan Kim et al.
JailbreakRadar: Comprehensive Assessment of Jailbreak Attacks Against LLMs
Junjie Chu, Yugeng Liu, Ziqing Yang et al.
Enhancing Mathematical Reasoning in LLMs by Stepwise Correction
Zhenyu Wu, Qingkai Zeng, Zhihan Zhang et al.
Evaluating Personalized Tool-Augmented LLMs from the Perspectives of Personalization and Proactivity
Yupu Hao, Pengfei Cao, Zhuoran Jin et al.
IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization
Xinghua Zhang, Haiyang Yu, Cheng Fu et al.
Flipping Knowledge Distillation: Leveraging Small Models’ Expertise to Enhance LLMs in Text Matching
Mingzhe Li, Jing Xiang, Qishen Zhang et al.
CoreEval: Automatically Building Contamination-Resilient Datasets with Real-World Knowledge toward Reliable LLM Evaluation
Jingqian Zhao, Bingbing Wang, Geng Tu et al.
Caution for the Environment: Multimodal LLM Agents are Susceptible to Environmental Distractions
Xinbei Ma, Yiting Wang, Yao Yao et al.