Papers

2,781 papers found
Persistent Pre-training Poisoning of LLMs
Yiming Zhang, Javier Rando, Ivan Evtimov et al.
2025 ICLR
Does Safety Training of LLMs Generalize to Semantically Related Natural Prompts?
Sravanti Addepalli, Yerram Varun, Arun Suggala et al.
2025 ICLR
2025 ICLR
Can Watermarked LLMs be Identified by Users via Crafted Prompts?
Aiwei Liu, Sheng Guan, Yiming Liu et al.
2025 ICLR
2025 ICLR
Zeroth-Order Fine-Tuning of LLMs with Transferable Static Sparsity
Wentao Guo, Jikai Long, Yimeng Zeng et al.
2025 ICLR
Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval
Sheryl Hsu, Omar Khattab, Chelsea Finn et al.
2025 ICLR
ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement
XIANGYU PENG, Congying Xia, Xinyi Yang et al.
2025 ICLR
On Evaluating the Durability of Safeguards for Open-Weight LLMs
Xiangyu Qi, Boyi Wei, Nicholas Carlini et al.
2025 ICLR
2025 ICLR
2025 ICLR
Transformer Block Coupling and its Correlation with Generalization in LLMs
Murdock Aubry, Haoming Meng, Anton Sugolov et al.
2025 ICLR
Tamper-Resistant Safeguards for Open-Weight LLMs
Rishub Tamirisa, Bhrugu Bharathi, Long Phan et al.
2025 ICLR