Florian Tramer

39 papers · 2018–2026 · 4 conferences · across top CS/AI conferences

Achievements

+14 more ↓

🌍 Conference Polyglot (3) 🏃 Academic Marathon (7) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (12)

🐝 Cross-Pollinator (12) 🌈 Renaissance Researcher (5) 🗺️ Taxonomy Completionist (28) 🧬 Topic Evolution 🤝 Dynamic Duo (22) 👥 Mega-Team (21) 👑 Triple Crown 🏆 Keyword Champion (2) 🗃️ Keyword Collector (67) ⚡ Prolific Year (9) 🚀 Conference Pioneer 🔥 Unstoppable (8) 💎 Century Club (38) ❓ The Questioner (2)

Conferences

ICLR (15) NIPS (12) ICML (11) ACL (1)

Top co-authors

Nicholas Carlini (22) Matthew Jagielski (9) Daphne Ippolito (8) Javier Rando (8) Milad Nasr (7) Katherine Lee (7) Edoardo Debenedetti (6) Nicolas Papernot (5) Christopher A. Choquette-Choo (5) Chiyuan Zhang (5)

Research topics

Differential Privacy (1) Privacy (1)

Keywords

adversarial example (8) adversarial attack (6) large language model (5) adversarial robustness (5) adversarial perturbation (3) adversarial learning (3) membership inference (3) adversarial training (2) prompt injection (2) defense evaluation (2) differential privacy (2) robust classification (2) model robustness (2) robustness evaluation (2) adversarial prompt (2) privacy attack (2) language model (2) security evaluation (2) query-based attack (2) machine unlearning (1)

Papers

Apertus: Democratizing Open and Compliant LLMs for Global Language Environments ACL 2026 Consistency Checks for Language Model Forecasters ICLR 2025 Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI ICLR 2025 Scalable Extraction of Training Data from Aligned, Production Language Models ICLR 2025 Measuring Non-Adversarial Reproduction of Training Data in Large Language Models ICLR 2025 The Jailbreak Tax: How Useful are Your Jailbreak Outputs? ICML 2025 Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards ICML 2025 Persistent Pre-training Poisoning of LLMs ICLR 2025 Adversarial Search Engine Optimization for Large Language Models ICLR 2025 AutoAdvExBench: Benchmarking Autonomous Exploitation of Adversarial Example Defenses ICML 2025 AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents NIPS 2024 Query-Based Adversarial Prompt Generation NIPS 2024 Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining ICML 2024 Extracting Training Data From Document-Based VQA Models ICML 2024 Privacy Backdoors: Stealing Data with Corrupted Pretrained Models ICML 2024 Stealing part of a production language model ICML 2024 Universal Jailbreak Backdoors from Poisoned Human Feedback ICLR 2024 Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition NIPS 2024 JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models NIPS 2024 Quantifying Memorization Across Neural Language Models ICLR 2023 Counterfactual Memorization in Neural Language Models NIPS 2023 Students Parrot Their Teachers: Membership Inference on Model Distillation NIPS 2023 Are aligned neural networks adversarially aligned? NIPS 2023 (Certified!!) Adversarial Robustness for Free! ICLR 2023 Measuring Forgetting of Memorized Training Examples ICLR 2023 Preprocessors Matter! Realistic Decision-Based Attacks on Machine Learning Systems ICML 2023 Data Poisoning Won’t Save You From Facial Recognition ICLR 2022 Large Language Models Can Be Strong Differentially Private Learners ICLR 2022 Increasing Confidence in Adversarial Robustness Evaluations NIPS 2022 Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them ICML 2022 The Privacy Onion Effect: Memorization is Relative NIPS 2022 Differentially Private Learning Needs Better Features (or Much More Data) ICLR 2021 Label-Only Membership Inference Attacks ICML 2021 Antipodes of Label Differential Privacy: PATE and ALIBI NIPS 2021 On Adaptive Attacks to Adversarial Example Defenses NIPS 2020 Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial Perturbations ICML 2020 Adversarial Training and Robustness for Multiple Perturbations NIPS 2019 Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware ICLR 2019 Ensemble Adversarial Training: Attacks and Defenses ICLR 2018