Michael Backes

26 papers · 2019–2026 · 10 conferences · across top CS/AI conferences

Achievements

+10 more ↓

🌍 Conference Polyglot (10) 🏃 Academic Marathon (6) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (14)

🐝 Cross-Pollinator (14) 🌈 Renaissance Researcher (8) 🗺️ Taxonomy Completionist (43) 👥 Mega-Team (71) 👑 Triple Crown 🤝 Dynamic Duo (16) 💎 Century Club (23) 🗃️ Keyword Collector (84) ⚡ Prolific Year (9) ❓ The Questioner (3)

Conferences

ACL (6) ICML (5) EMNLP (4) ICLR (4) NIPS (2) CVPR (1) ICCV (1) IJCAI (1) NAACL (1) WACV (1)

Top co-authors

Yang Zhang (19) Xinyue Shen (6) Franziska Boenisch (5) Adam Dziedzic (5) Yun Shen (4) Xinlei He (4) Yihan Ma (3) Wenhao Wang (3) Ning Yu (3) Yukun Jiang (3)

Research topics

Privacy (2) Differential Privacy (1)

Keywords

large language model (8) adversarial attack (5) security vulnerability (3) self-supervised learning (2) defense mechanism (2) generative model (2) diffusion model (2) prompt injection (2) contrastive learning (1) graph classification (1) privacy attack (1) transformer architecture (1) social media analysis (1) prompt engineering (1) text classification (1) data poisoning (1) lottery ticket hypothesis (1) model adaptation (1) deep learning (1) deepfake detection (1)

Papers

Pruning Unsafe Tickets: A Resource-Efficient Framework for Safer and More Robust LLMs ACL 2026 Open Schrödinger’s Closed Box: Identifying Retrieval Augmented Generation in API-Accessible Large Language Model Services ACL 2026 DE-CLIP: Few-Shot Anomaly Detection via Difference-Guided Embedding Editing ACL 2026 Are We in the AI-Generated Text World Already? Quantifying and Monitoring AIGT on Social Media ACL 2025 Captured by Captions: On Memorization and its Mitigation in CLIP Models ICLR 2025 When GPT Spills the Tea: Comprehensive Assessment of Knowledge File Leakage in GPTs ACL 2025 JailbreakRadar: Comprehensive Assessment of Jailbreak Attacks Against LLMs ACL 2025 Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification EMNLP 2025 Hate in Plain Sight: On the Risks of Moderating AI-Generated Hateful Illusions ICCV 2025 SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation ICLR 2025 Efficient and Privacy-Preserving Soft Prompt Transfer for LLMs ICML 2025 Provably Cost-Sensitive Adversarial Defense via Randomized Smoothing ICML 2025 Memorization in Self-Supervised Learning Improves Downstream Generalization ICLR 2024 Open LLMs are Necessary for Current Private Adaptations and Outperform their Closed Alternatives NIPS 2024 Reconstruct Your Previous Conversations! Comprehensively Investigating Privacy Leakage Risks in Conversations with GPT Models EMNLP 2024 Localizing Memorization in SSL Vision Encoders NIPS 2024 ModSCAN: Measuring Stereotypical Bias in Large Vision-Language Models from Vision and Language Modalities EMNLP 2024 The Death and Life of Great Prompts: Analyzing the Evolution of LLM Prompts from the Structural Perspective EMNLP 2024 Composite Backdoor Attacks Against Large Language Models NAACL 2024 Generated Distributions Are All You Need for Membership Inference Attacks Against Generative Models WACV 2024 Position: TrustLLM: Trustworthiness in Large Language Models ICML 2024 Generated Graph Detection ICML 2023 Data Poisoning Attacks Against Multimodal Encoders ICML 2023 Can't Steal? Cont-Steal! Contrastive Stealing Attacks Against Image Encoders CVPR 2023 Is Adversarial Training Really a Silver Bullet for Mitigating Data Poisoning? ICLR 2023 Fairwalk: Towards Fair Graph Embedding IJCAI 2019