Research Explorer

Taming Overconfidence in LLMs: Reward Calibration in RLHF

Jixuan Leng, Chengsong Huang, Banghua Zhu et al.

2025 ICLR

A Benchmark for Semantic Sensitive Information in LLMs Outputs

Qingjie Zhang, Han Qiu, Di Wang et al.

2025 ICLR

PiCO: Peer Review in LLMs based on Consistency Optimization

Kun-Peng Ning, Shuo Yang, Yuyang Liu et al.

2025 ICLR

Uncovering Gaps in How Humans and LLMs Interpret Subjective Language

Erik Jones, Arjun Patrawala, Jacob Steinhardt

2025 ICLR

Ada-K Routing: Boosting the Efficiency of MoE-based LLMs

Tongtian Yue, Longteng Guo, Jie Cheng et al.

2025 ICLR

MallowsPO: Fine-Tune Your LLM with Preference Dispersions

Haoxian Chen, Hanyang Zhao, Henry Lam et al.

2025 ICLR

Catastrophic Failure of LLM Unlearning via Quantization

Zhiwei Zhang, Fali Wang, Xiaomin Li et al.

2025 ICLR

Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only

Jihan Yao, Wenxuan Ding, Shangbin Feng et al.

2025 ICLR

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Guangxuan Xiao, Jiaming Tang, Jingwei Zuo et al.

2025 ICLR

Permute-and-Flip: An optimally stable and watermarkable decoder for LLMs

Xuandong Zhao, Lei Li, Yu-Xiang Wang

2025 ICLR

Aligned LLMs Are Not Aligned Browser Agents

Priyanshu Kumar, Elaine Lau, Saranya Vijayakumar et al.

2025 ICLR

DS-LLM: Leveraging Dynamical Systems to Enhance Both Training and Inference of Large Language Models

Ruibing Song, Chuan Liu, Chunshu Wu et al.

2025 ICLR

MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs

Jiarui Zhang, Mahyar Khayatkhoei, Prateek Chhikara et al.

2025 ICLR

Do LLMs ``know'' internally when they follow instructions?

Juyeon Heo, Christina Heinze-Deml, Oussama Elachqar et al.

2025 ICLR

Optimized Multi-Token Joint Decoding With Auxiliary Model for LLM Inference

Zongyue Qin, Ziniu Hu, Zifan He et al.

2025 ICLR

Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution

Haiyan Zhao, Heng Zhao, Bo Shen et al.

2025 ICLR

Learning Dynamics of LLM Finetuning

Yi Ren, Danica J. Sutherland

2025 ICLR

Do LLMs have Consistent Values?

Naama Rozen, Liat Bezalel, Gal Elidan et al.

2025 ICLR

BadRobot: Jailbreaking Embodied LLM Agents in the Physical World

Hangtao Zhang, Chenyu Zhu, Xianlong Wang et al.

2025 ICLR

Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG

Bowen Jin, Jinsung Yoon, Jiawei Han et al.

2025 ICLR

Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs

Jonas Hübotter, Sascha Bongni, Ido Hakimi et al.

2025 ICLR

BOND: Aligning LLMs with Best-of-N Distillation

Pier Giuseppe Sessa, Robert Dadashi-Tazehozi, Leonard Hussenot et al.

2025 ICLR

Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs

Zhaowei Zhang, Fengshuo Bai, Qizhi Chen et al.

2025 ICLR

Encryption-Friendly LLM Architecture

Donghwan Rho, Taeseong Kim, Minje Park et al.

2025 ICLR

PersonalLLM: Tailoring LLMs to Individual Preferences

Thomas P Zollo, Andrew Wei Tung Siah, Naimeng Ye et al.

2025 ICLR

Papers