Co-occurring keywords
Papers
Reasoning-Augmented Conversation for Multi-Turn Jailbreak Attacks on Large Language Models
EMNLP 2025
Obliviate: Neutralizing Task-agnostic Backdoors within the Parameter-efficient Fine-tuning Paradigm
NAACL 2025
Vulnerability of Large Language Models to Output Prefix Jailbreaks: Impact of Positions on Safety
NAACL 2025
Reasoning-to-Defend: Safety-Aware Reasoning Can Defend Large Language Models from Jailbreaking
EMNLP 2025