Papers

5,479 papers found
DNR Bench: Benchmarking Over-Reasoning in Reasoning LLMs
Oluwanifemi Bamgbose, Masoud Hashemi, Sathwik Tejaswi Madhusudhan et al.
2026 AAAI
2026 AAAI
2026 AAAI
Silenced Biases: The Dark Side LLMs Learned to Refuse
Rom Himelstein, Amit LeVi, Brit Youngmann et al.
2026 AAAI
Cost-Minimized Label-Flipping Poisoning Attack to LLM Alignment
Shigeki Kusaka, Keita Saito, Mikoto Kudo et al.
2026 AAAI
Dropouts in Confidence: Moral Uncertainty in Human-LLM Alignment
Jea Kwon, Luiz Felipe Vecchietti, Sungwon Park et al.
2026 AAAI
2026 AAAI
STACK: Adversarial Attacks on LLM Safeguard Pipelines
Ian R. McKenzie, Oskar John Hollinsworth, Tom Tseng et al.
2026 AAAI
AdvBDGen: A Robust Framework for Generating Adaptive and Stealthy Backdoors in LLM Alignment
Pankayaraj Pathmanathan, Udari Madhushani Sehwag, Michael-Andrei Panaitescu-Liess et al.
2026 AAAI
2026 AAAI
STAR-1: Safer Alignment of Reasoning LLMs with 1K Data
Zijun Wang, Haoqin Tu, Yuhan Wang et al.
2026 AAAI
2026 AAAI