Nicholas Carlini

41 papers · 2018–2025 · 5 conferences · across top CS/AI conferences

Achievements

+13 more ↓

🌍 Conference Polyglot (5) 🏃 Academic Marathon (7) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (10)

🧭 Keyword Pioneer 🌈 Renaissance Researcher (7) 🌍 Conference Polyglot (5) 🤝 Dynamic Duo (22) 👑 Triple Crown 👥 Mega-Team (34) 🔬 Deep Specialist (13) 🏆 Keyword Champion (4) 🗃️ Keyword Collector (88) ⚡ Prolific Year (9) ❓ The Questioner 🔥 Unstoppable (8) 💎 Century Club (41)

Conferences

ICLR (14) NIPS (14) ICML (11) ACL (1) CVPR (1)

Top co-authors

Florian Tramer (22) Matthew Jagielski (10) Daphne Ippolito (9) Milad Nasr (8) Katherine Lee (8) Chiyuan Zhang (6) Javier Rando (5) Nicolas Papernot (5) Christopher A. Choquette-Choo (5) David Berthelot (4)

Research topics

Privacy (2) Differential Privacy (1)

Keywords

adversarial example (10) adversarial robustness (5) membership inference (4) language model (3) model robustness (3) adversarial perturbation (3) differential privacy (3) adversarial attack (3) privacy attack (3) semi-supervised learning (2) image classification (2) model poisoning (2) training datum (2) robustness evaluation (2) domain generalization (2) adversarial training (2) backdoor attack (2) data augmentation (2) distribution shift (2) query-based attack (2)

Papers

Position: In-House Evaluation Is Not Enough. Towards Robust Third-Party Evaluation and Flaw Disclosure for General-Purpose AI ICML 2025 Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards ICML 2025 AutoAdvExBench: Benchmarking Autonomous Exploitation of Adversarial Example Defenses ICML 2025 Scalable Extraction of Training Data from Aligned, Production Language Models ICLR 2025 Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI ICLR 2025 Measuring Non-Adversarial Reproduction of Training Data in Large Language Models ICLR 2025 On Evaluating the Durability of Safeguards for Open-Weight LLMs ICLR 2025 Persistent Pre-training Poisoning of LLMs ICLR 2025 Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models NIPS 2024 Initialization Matters for Adversarial Transfer Learning CVPR 2024 Stealing part of a production language model ICML 2024 Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining ICML 2024 Query-Based Adversarial Prompt Generation NIPS 2024 Preprocessors Matter! Realistic Decision-Based Attacks on Machine Learning Systems ICML 2023 Counterfactual Memorization in Neural Language Models NIPS 2023 Students Parrot Their Teachers: Membership Inference on Model Distillation NIPS 2023 Are aligned neural networks adversarially aligned? NIPS 2023 Effective Robustness against Natural Distribution Shifts for Models with Different Training Data NIPS 2023 (Certified!!) Adversarial Robustness for Free! ICLR 2023 Measuring Forgetting of Memorized Training Examples ICLR 2023 Part-Based Models Improve Adversarial Robustness ICLR 2023 Quantifying Memorization Across Neural Language Models ICLR 2023 Poisoning and Backdooring Contrastive Learning ICLR 2022 AdaMatch: A Unified Approach to Semi-Supervised Learning and Domain Adaptation ICLR 2022 Data Poisoning Won’t Save You From Facial Recognition ICLR 2022 Evading Adversarial Example Detection Defenses with Orthogonal Projected Gradient Descent ICLR 2022 Increasing Confidence in Adversarial Robustness Evaluations NIPS 2022 Deduplicating Training Data Makes Language Models Better ACL 2022 Indicators of Attack Failure: Debugging and Improving Optimization of Adversarial Examples NIPS 2022 The Privacy Onion Effect: Memorization is Relative NIPS 2022 Handcrafted Backdoors in Deep Neural Networks NIPS 2022 Label-Only Membership Inference Attacks ICML 2021 Measuring Robustness to Natural Distribution Shifts in Image Classification NIPS 2020 On Adaptive Attacks to Adversarial Example Defenses NIPS 2020 FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence NIPS 2020 ReMixMatch: Semi-Supervised Learning with Distribution Matching and Augmentation Anchoring ICLR 2020 Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial Perturbations ICML 2020 MixMatch: A Holistic Approach to Semi-Supervised Learning NIPS 2019 Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition ICML 2019 Adversarial Examples Are a Natural Consequence of Test Error in Noise ICML 2019 Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples ICML 2018