conftrace_

Papers

5,914 papers found · incl. 435 without abstracts Only with abstracts
Active Task Disambiguation with LLMs
Kasia Kobalczyk, Nicolás Astorga, Tennison Liu et al.
2025 ICLR
On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback
Marcus Williams, Micah Carroll, Adhyyan Narang et al.
2025 ICLR
2025 ICLR
2025 ICLR
Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs
Sungmin Cha, Sungjun Cho, Dasol Hwang et al.
2025 ICLR
RMB: Comprehensively benchmarking reward models in LLM alignment
Enyu Zhou, Guodong Zheng, Binghai Wang et al.
2025 ICLR
GraphRouter: A Graph-based Router for LLM Selections
Tao Feng, Yanzhen Shen, Jiaxuan You
2025 ICLR
2025 ICLR
ReCogLab: a framework testing relational reasoning & cognitive hypotheses on LLMs
Andrew Liu, Henry Prior, Gargi Balasubramaniam et al.
2025 ICLR
Injecting Universal Jailbreak Backdoors into LLMs in Minutes
Zhuowei Chen, Qiannan Zhang, Shichao Pei
2025 ICLR
2025 ICLR
Can Video LLMs Refuse to Answer? Alignment for Answerability in Video Large Language Models
Eunseop Yoon, Hee Suk Yoon, Mark A. Hasegawa-Johnson et al.
2025 ICLR
Persistent Pre-training Poisoning of LLMs
Yiming Zhang, Javier Rando, Ivan Evtimov et al.
2025 ICLR
ELICIT: LLM Augmentation Via External In-context Capability
Futing Wang, Jianhao Yan, Yue Zhang et al.
2025 ICLR
2025 ICLR
Does Safety Training of LLMs Generalize to Semantically Related Natural Prompts?
Sravanti Addepalli, Yerram Varun, Arun Suggala et al.
2025 ICLR