Papers
Taming Overconfidence in LLMs: Reward Calibration in RLHF
Jixuan Leng, Chengsong Huang, Banghua Zhu et al.
A Benchmark for Semantic Sensitive Information in LLMs Outputs
Qingjie Zhang, Han Qiu, Di Wang et al.
PiCO: Peer Review in LLMs based on Consistency Optimization
Kun-Peng Ning, Shuo Yang, Yuyang Liu et al.
Uncovering Gaps in How Humans and LLMs Interpret Subjective Language
Erik Jones, Arjun Patrawala, Jacob Steinhardt
Ada-K Routing: Boosting the Efficiency of MoE-based LLMs
Tongtian Yue, Longteng Guo, Jie Cheng et al.
MallowsPO: Fine-Tune Your LLM with Preference Dispersions
Haoxian Chen, Hanyang Zhao, Henry Lam et al.
Catastrophic Failure of LLM Unlearning via Quantization
Zhiwei Zhang, Fali Wang, Xiaomin Li et al.
Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only
Jihan Yao, Wenxuan Ding, Shangbin Feng et al.
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Guangxuan Xiao, Jiaming Tang, Jingwei Zuo et al.
Permute-and-Flip: An optimally stable and watermarkable decoder for LLMs
Xuandong Zhao, Lei Li, Yu-Xiang Wang
Aligned LLMs Are Not Aligned Browser Agents
Priyanshu Kumar, Elaine Lau, Saranya Vijayakumar et al.
DS-LLM: Leveraging Dynamical Systems to Enhance Both Training and Inference of Large Language Models
Ruibing Song, Chuan Liu, Chunshu Wu et al.
MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs
Jiarui Zhang, Mahyar Khayatkhoei, Prateek Chhikara et al.
Do LLMs ``know'' internally when they follow instructions?
Juyeon Heo, Christina Heinze-Deml, Oussama Elachqar et al.
Optimized Multi-Token Joint Decoding With Auxiliary Model for LLM Inference
Zongyue Qin, Ziniu Hu, Zifan He et al.
Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution
Haiyan Zhao, Heng Zhao, Bo Shen et al.
Learning Dynamics of LLM Finetuning
Yi Ren, Danica J. Sutherland
Do LLMs have Consistent Values?
Naama Rozen, Liat Bezalel, Gal Elidan et al.
BadRobot: Jailbreaking Embodied LLM Agents in the Physical World
Hangtao Zhang, Chenyu Zhu, Xianlong Wang et al.
Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG
Bowen Jin, Jinsung Yoon, Jiawei Han et al.
Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs
Jonas Hübotter, Sascha Bongni, Ido Hakimi et al.
BOND: Aligning LLMs with Best-of-N Distillation
Pier Giuseppe Sessa, Robert Dadashi-Tazehozi, Leonard Hussenot et al.
Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs
Zhaowei Zhang, Fengshuo Bai, Qizhi Chen et al.
Encryption-Friendly LLM Architecture
Donghwan Rho, Taeseong Kim, Minje Park et al.
PersonalLLM: Tailoring LLMs to Individual Preferences
Thomas P Zollo, Andrew Wei Tung Siah, Naimeng Ye et al.