Papers

5,479 papers found
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs
Shane Bergsma, Nolan Simran Dey, Gurpreet Gosal et al.
2025 ICLR
2025 ICLR
SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs
Mohammad Mozaffari, Amir Yazdanbakhsh, Zhao Zhang et al.
2025 ICLR
2025 ICLR
2025 ICLR
2025 ICLR
TODO: Enhancing LLM Alignment with Ternary Preferences
Yuxiang Guo, Lu Yin, Bo Jiang et al.
2025 ICLR
Robust LLM safeguarding via refusal feature adversarial training
Lei Yu, Virginie Do, Karen Hambardzumyan et al.
2025 ICLR
Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?
Egor Zverev, Sahar Abdelnabi, Soroush Tabesh et al.
2025 ICLR
SELF-EVOLVED REWARD LEARNING FOR LLMS
Chenghua Huang, Zhizhen Fan, Lu Wang et al.
2025 ICLR
Why Does the Effective Context Length of LLMs Fall Short?
Chenxin An, Jun Zhang, Ming Zhong et al.
2025 ICLR
Tell me about yourself: LLMs are aware of their learned behaviors
Jan Betley, Xuchan Bao, Martín Soto et al.
2025 ICLR
2025 ICLR
2025 ICLR
SysBench: Can LLMs Follow System Message?
Yanzhao Qin, Tao Zhang, Tao Zhang et al.
2025 ICLR
From Tokens to Words: On the Inner Lexicon of LLMs
Guy Kaplan, Matanel Oren, Yuval Reif et al.
2025 ICLR
2025 ICLR
HeadMap: Locating and Enhancing Knowledge Circuits in LLMs
Xuehao Wang, Liyuan Wang, Binghuai Lin et al.
2025 ICLR
2025 ICLR
Unified Parameter-Efficient Unlearning for LLMs
Chenlu Ding, Jiancan Wu, Yancheng Yuan et al.
2025 ICLR
2025 ICLR
LLM Unlearning via Loss Adjustment with Only Forget Data
Yaxuan Wang, Jiaheng Wei, Chris Yuhao Liu et al.
2025 ICLR