Research Explorer

Taming Overconfidence in LLMs: Reward Calibration in RLHF

Jixuan Leng, Chengsong Huang, Banghua Zhu et al.

2025 ICLR

A Benchmark for Semantic Sensitive Information in LLMs Outputs

Qingjie Zhang, Han Qiu, Di Wang et al.

2025 ICLR

PiCO: Peer Review in LLMs based on Consistency Optimization

Kun-Peng Ning, Shuo Yang, Yuyang Liu et al.

2025 ICLR

Uncovering Gaps in How Humans and LLMs Interpret Subjective Language

Erik Jones, Arjun Patrawala, Jacob Steinhardt

2025 ICLR

Ada-K Routing: Boosting the Efficiency of MoE-based LLMs

Tongtian Yue, Longteng Guo, Jie Cheng et al.

2025 ICLR

Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only

Jihan Yao, Wenxuan Ding, Shangbin Feng et al.

2025 ICLR

Permute-and-Flip: An optimally stable and watermarkable decoder for LLMs

Xuandong Zhao, Lei Li, Yu-Xiang Wang

2025 ICLR

Aligned LLMs Are Not Aligned Browser Agents

Priyanshu Kumar, Elaine Lau, Saranya Vijayakumar et al.

2025 ICLR

MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs

Jiarui Zhang, Mahyar Khayatkhoei, Prateek Chhikara et al.

2025 ICLR

Do LLMs ``know'' internally when they follow instructions?

Juyeon Heo, Christina Heinze-Deml, Oussama Elachqar et al.

2025 ICLR

Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution

Haiyan Zhao, Heng Zhao, Bo Shen et al.

2025 ICLR

Do LLMs have Consistent Values?

Naama Rozen, Liat Bezalel, Gal Elidan et al.

2025 ICLR

Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG

Bowen Jin, Jinsung Yoon, Jiawei Han et al.

2025 ICLR

Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs

Jonas Hübotter, Sascha Bongni, Ido Hakimi et al.

2025 ICLR

BOND: Aligning LLMs with Best-of-N Distillation

Pier Giuseppe Sessa, Robert Dadashi-Tazehozi, Leonard Hussenot et al.

2025 ICLR

Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs

Zhaowei Zhang, Fengshuo Bai, Qizhi Chen et al.

2025 ICLR

PersonalLLM: Tailoring LLMs to Individual Preferences

Thomas P Zollo, Andrew Wei Tung Siah, Naimeng Ye et al.

2025 ICLR

Shh, don't say that! Domain Certification in LLMs

Cornelius Emde, Alasdair Paren, Preetham Arvind et al.

2025 ICLR

Transformer-Squared: Self-adaptive LLMs

Qi Sun, Edoardo Cetin, Yujin Tang

2025 ICLR

Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs

Yuzhe Gu, Wenwei Zhang, Chengqi Lyu et al.

2025 ICLR

Do LLMs estimate uncertainty well in instruction-following?

Juyeon Heo, Miao Xiong, Christina Heinze-Deml et al.

2025 ICLR

DON’T STOP ME NOW: EMBEDDING BASED SCHEDULING FOR LLMS

Rana Shahout, eran malach, Chunwei Liu et al.

2025 ICLR

Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers

Chenglei Si, Diyi Yang, Tatsunori Hashimoto

2025 ICLR

Needle Threading: Can LLMs Follow Threads Through Near-Million-Scale Haystacks?

Jonathan Roberts, Kai Han, Samuel Albanie

2025 ICLR

PAD: Personalized Alignment of LLMs at Decoding-time

Ruizhe Chen, Xiaotian Zhang, Meng Luo et al.

2025 ICLR

Papers