Papers

2,781 papers found
Taming Overconfidence in LLMs: Reward Calibration in RLHF
Jixuan Leng, Chengsong Huang, Banghua Zhu et al.
2025 ICLR
2025 ICLR
PiCO: Peer Review in LLMs based on Consistency Optimization
Kun-Peng Ning, Shuo Yang, Yuyang Liu et al.
2025 ICLR
Uncovering Gaps in How Humans and LLMs Interpret Subjective Language
Erik Jones, Arjun Patrawala, Jacob Steinhardt
2025 ICLR
Ada-K Routing: Boosting the Efficiency of MoE-based LLMs
Tongtian Yue, Longteng Guo, Jie Cheng et al.
2025 ICLR
Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only
Jihan Yao, Wenxuan Ding, Shangbin Feng et al.
2025 ICLR
Aligned LLMs Are Not Aligned Browser Agents
Priyanshu Kumar, Elaine Lau, Saranya Vijayakumar et al.
2025 ICLR
2025 ICLR
Do LLMs ``know'' internally when they follow instructions?
Juyeon Heo, Christina Heinze-Deml, Oussama Elachqar et al.
2025 ICLR
Do LLMs have Consistent Values?
Naama Rozen, Liat Bezalel, Gal Elidan et al.
2025 ICLR
2025 ICLR
Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs
Jonas Hübotter, Sascha Bongni, Ido Hakimi et al.
2025 ICLR
BOND: Aligning LLMs with Best-of-N Distillation
Pier Giuseppe Sessa, Robert Dadashi-Tazehozi, Leonard Hussenot et al.
2025 ICLR
2025 ICLR
PersonalLLM: Tailoring LLMs to Individual Preferences
Thomas P Zollo, Andrew Wei Tung Siah, Naimeng Ye et al.
2025 ICLR
Shh, don't say that! Domain Certification in LLMs
Cornelius Emde, Alasdair Paren, Preetham Arvind et al.
2025 ICLR
Transformer-Squared: Self-adaptive LLMs
Qi Sun, Edoardo Cetin, Yujin Tang
2025 ICLR
Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs
Yuzhe Gu, Wenwei Zhang, Chengqi Lyu et al.
2025 ICLR
Do LLMs estimate uncertainty well in instruction-following?
Juyeon Heo, Miao Xiong, Christina Heinze-Deml et al.
2025 ICLR
DON’T STOP ME NOW: EMBEDDING BASED SCHEDULING FOR LLMS
Rana Shahout, eran malach, Chunwei Liu et al.
2025 ICLR
PAD: Personalized Alignment of LLMs at Decoding-time
Ruizhe Chen, Xiaotian Zhang, Meng Luo et al.
2025 ICLR