conftrace_

Papers

5,914 papers found · incl. 435 without abstracts Only with abstracts
Dynamic Deep Prompt Optimization for Defending Against Jailbreak Attacks on LLMs
Doniyorkhon Obidov, Honggang Yu, Xiaolong Guo et al.
2026 AAAI
2026 AAAI
Reason2Attack: Jailbreaking Text-to-Image Models via LLM Reasoning
Chenyu Zhang, Lanjun Wang, Yiwen Ma et al.
2026 AAAI
SPIRAL: Symbolic LLM Planning via Grounded and Reflective Search
Yifan Zhang, Giridhar Ganapavarapu, Srideepika Jayaraman et al.
2026 AAAI
DNR Bench: Benchmarking Over-Reasoning in Reasoning LLMs
Oluwanifemi Bamgbose, Masoud Hashemi, Sathwik Tejaswi Madhusudhan et al.
2026 AAAI
2026 AAAI
2026 AAAI
Silenced Biases: The Dark Side LLMs Learned to Refuse
Rom Himelstein, Amit LeVi, Brit Youngmann et al.
2026 AAAI
Cost-Minimized Label-Flipping Poisoning Attack to LLM Alignment
Shigeki Kusaka, Keita Saito, Mikoto Kudo et al.
2026 AAAI
Dropouts in Confidence: Moral Uncertainty in Human-LLM Alignment
Jea Kwon, Luiz Felipe Vecchietti, Sungwon Park et al.
2026 AAAI
2026 AAAI
STACK: Adversarial Attacks on LLM Safeguard Pipelines
Ian R. McKenzie, Oskar John Hollinsworth, Tom Tseng et al.
2026 AAAI
AdvBDGen: A Robust Framework for Generating Adaptive and Stealthy Backdoors in LLM Alignment
Pankayaraj Pathmanathan, Udari Madhushani Sehwag, Michael-Andrei Panaitescu-Liess et al.
2026 AAAI