Xiangyu Qi
14 papers · 2021–2025 · 7 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+8 more ↓ Show less ↑
π Cross-Pollinator (14) π Interdisciplinary Bridge π£ Hot Topic Early Bird π Conference Polyglot (7) πΊοΈ Taxonomy Completionist (14)
πΊοΈ
Taxonomy Completionist
(14)
π₯
Mega-Team
(35)
π
Triple Crown
π
Keyword Champion
(2)
π
Grand Slam
π₯
Unstoppable
(5)
β‘
Prolific Year
(5)
π
Century Club
(14)
Conferences
ICLR (6)
ICML (3)
AAAI (1)
ACL (1)
CVPR (1)
NAACL (1)
NIPS (1)
Top co-authors
Keywords
adversarial attack
(3)
jailbreak attack
(2)
safety guardrail
(2)
large language model
(2)
deep neural network
(2)
adversarial robustness
(2)
adversarial training
(1)
model safety
(1)
responsible ai
(1)
ai safety
(1)
safety alignment
(1)
distribution shift
(1)
first-order logic
(1)
domain knowledge
(1)
backdoor attack
(1)
probabilistic graphical model
(1)
adversarial defense
(1)
vision-language model
(1)
model deployment
(1)
fine-tuning attack
(1)
Papers
Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability
NAACL 2025
Defensive Prompt Patch: A Robust and Generalizable Defense of Large Language Models against Jailbreak Attacks
ACL 2025
On Evaluating the Durability of Safeguards for Open-Weight LLMs
ICLR 2025
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
ICLR 2025
Safety Alignment Should be Made More Than Just a Few Tokens Deep
ICLR 2025
Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
ICLR 2024
BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection
ICLR 2024
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
ICML 2024
BackdoorAlign: Mitigating Fine-tuning based Jailbreak Attack with Backdoor Enhanced Safety Alignment
NIPS 2024
Visual Adversarial Examples Jailbreak Aligned Large Language Models
AAAI 2024
Revisiting the Assumption of Latent Separability for Backdoor Defenses
ICLR 2023
Uncovering Adversarial Risks of Test-Time Adaptation
ICML 2023
Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks
CVPR 2022
Knowledge Enhanced Machine Learning Pipeline against Diverse Adversarial Attacks
ICML 2021