conftrace_

Papers

5,914 papers found · incl. 435 without abstracts Only with abstracts
Taming Overconfidence in LLMs: Reward Calibration in RLHF
Jixuan Leng, Chengsong Huang, Banghua Zhu et al.
2025 ICLR
2025 ICLR
PiCO: Peer Review in LLMs based on Consistency Optimization
Kun-Peng Ning, Shuo Yang, Yuyang Liu et al.
2025 ICLR
Uncovering Gaps in How Humans and LLMs Interpret Subjective Language
Erik Jones, Arjun Patrawala, Jacob Steinhardt
2025 ICLR
Ada-K Routing: Boosting the Efficiency of MoE-based LLMs
Tongtian Yue, Longteng Guo, Jie Cheng et al.
2025 ICLR
MallowsPO: Fine-Tune Your LLM with Preference Dispersions
Haoxian Chen, Hanyang Zhao, Henry Lam et al.
2025 ICLR
Catastrophic Failure of LLM Unlearning via Quantization
Zhiwei Zhang, Fali Wang, Xiaomin Li et al.
2025 ICLR
Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only
Jihan Yao, Wenxuan Ding, Shangbin Feng et al.
2025 ICLR
2025 ICLR
Aligned LLMs Are Not Aligned Browser Agents
Priyanshu Kumar, Elaine Lau, Saranya Vijayakumar et al.
2025 ICLR
2025 ICLR
Do LLMs ``know'' internally when they follow instructions?
Juyeon Heo, Christina Heinze-Deml, Oussama Elachqar et al.
2025 ICLR
Learning Dynamics of LLM Finetuning
Yi Ren, Danica J. Sutherland
2025 ICLR
Do LLMs have Consistent Values?
Naama Rozen, Liat Bezalel, Gal Elidan et al.
2025 ICLR
BadRobot: Jailbreaking Embodied LLM Agents in the Physical World
Hangtao Zhang, Chenyu Zhu, Xianlong Wang et al.
2025 ICLR
2025 ICLR
Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs
Jonas Hübotter, Sascha Bongni, Ido Hakimi et al.
2025 ICLR
BOND: Aligning LLMs with Best-of-N Distillation
Pier Giuseppe Sessa, Robert Dadashi-Tazehozi, Leonard Hussenot et al.
2025 ICLR
2025 ICLR
Encryption-Friendly LLM Architecture
Donghwan Rho, Taeseong Kim, Minje Park et al.
2025 ICLR
PersonalLLM: Tailoring LLMs to Individual Preferences
Thomas P Zollo, Andrew Wei Tung Siah, Naimeng Ye et al.
2025 ICLR