Co-occurring keywords
Papers
Unraveling the Mystery: Defending Against Jailbreak Attacks Via Unearthing Real Intention
COLING 2025
Misalignment Attack on Text-to-Image Models via Text Embedding Optimization and Inversion
EMNLP 2025
Pandora's Box: Towards Building Universal Attackers against Real-World Large Vision-Language Models
NIPS 2024
WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models
NIPS 2024
Explanation-based Training with Differentiable Insertion/Deletion Metric-aware Regularizers
AISTATS 2024
Gradient Coreset for Federated Learning
WACV 2024