Xiangyu Qi

14 papers · 2021–2025 · 7 conferences · across top CS/AI conferences

Achievements

+8 more ↓

🐝 Cross-Pollinator (14) 🌉 Interdisciplinary Bridge 🐣 Hot Topic Early Bird 🌍 Conference Polyglot (7) 🗺️ Taxonomy Completionist (14)

🗺️ Taxonomy Completionist (14) 👥 Mega-Team (35) 👑 Triple Crown 🏆 Keyword Champion (2) 🏆 Grand Slam 🔥 Unstoppable (5) ⚡ Prolific Year (5) 💎 Century Club (14)

Conferences

ICLR (6) ICML (3) AAAI (1) ACL (1) CVPR (1) NAACL (1) NIPS (1)

Top co-authors

Prateek Mittal (9) Tinghao Xie (7) Peter Henderson (6) Yangsibo Huang (3) Kaixuan Huang (3) Bo Li (3) Boyi Wei (3) Pin-Yu Chen (2) Mengdi Wang (2) Yiming Li (2)

Keywords

adversarial attack (3) jailbreak attack (2) safety guardrail (2) large language model (2) deep neural network (2) adversarial robustness (2) adversarial training (1) model safety (1) responsible ai (1) ai safety (1) safety alignment (1) distribution shift (1) first-order logic (1) domain knowledge (1) backdoor attack (1) probabilistic graphical model (1) adversarial defense (1) vision-language model (1) model deployment (1) fine-tuning attack (1)

Papers

Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability NAACL 2025 Defensive Prompt Patch: A Robust and Generalizable Defense of Large Language Models against Jailbreak Attacks ACL 2025 On Evaluating the Durability of Safeguards for Open-Weight LLMs ICLR 2025 SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal ICLR 2025 Safety Alignment Should be Made More Than Just a Few Tokens Deep ICLR 2025 Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To! ICLR 2024 BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection ICLR 2024 Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications ICML 2024 BackdoorAlign: Mitigating Fine-tuning based Jailbreak Attack with Backdoor Enhanced Safety Alignment NIPS 2024 Visual Adversarial Examples Jailbreak Aligned Large Language Models AAAI 2024 Revisiting the Assumption of Latent Separability for Backdoor Defenses ICLR 2023 Uncovering Adversarial Risks of Test-Time Adaptation ICML 2023 Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks CVPR 2022 Knowledge Enhanced Machine Learning Pipeline against Diverse Adversarial Attacks ICML 2021