Matt Fredrikson

21 papers · 2018–2026 · 6 conferences · across top CS/AI conferences

Achievements

+8 more ↓

🏃 Academic Marathon (7) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (6) 🐝 Cross-Pollinator (11)

🏃 Academic Marathon (7) 🗺️ Taxonomy Completionist (22) 🐝 Cross-Pollinator (11) 👑 Triple Crown 🧬 Topic Evolution 🔥 Unstoppable (8) 🗃️ Keyword Collector (73) 💎 Century Club (20)

Conferences

ICLR (8) NIPS (7) ACL (2) ICML (2) AISTATS (1) IJCAI (1)

Top co-authors

Klas Leino (9) Zifan Wang (9) Anupam Datta (5) Kai Hu (4) Samuel Yeom (3) Emily Black (3) Andy Zou (3) Maxwell Lin (2) Maksym Andriushchenko (2) Saranya Vijayakumar (2)

Keywords

adversarial robustness (5) adversarial attack (3) neural network (2) large language model (2) jailbreak attack (2) certified robustness (2) adversarial example (2) symbolic reasoning (1) algorithmic fairness (1) adversarial learning (1) ai safety (1) constrained optimization (1) safety alignment (1) model alignment (1) feature attribution (1) sparse optimization (1) neural network interpretability (1) linear regression (1) gradient descent (1) adversarial training (1)

Papers

Jailbreak-Zero: A Path to Pareto Optimal Red Teaming for Large Language Models ACL 2026 AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents ICLR 2025 Aligned LLMs Are Not Aligned Browser Agents ICLR 2025 Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization NIPS 2024 Improving Alignment and Robustness with Circuit Breakers NIPS 2024 A Recipe for Improved Certifiable Robustness ICLR 2024 Grounding Neural Inference with Satisfiability Modulo Theories NIPS 2023 Unlocking Deterministic Robustness Certification on ImageNet NIPS 2023 On the Perils of Cascading Robust Classifiers ICLR 2023 Selective Ensembles for Consistent Predictions ICLR 2022 Robust Models Are More Interpretable Because Attributions Look Normal ICML 2022 Consistent Counterfactuals for Deep Models ICLR 2022 Fast Geometric Projections for Local Robustness Certification ICLR 2021 Relaxing Local Robustness NIPS 2021 Globally-Robust Neural Networks ICML 2021 Learning Fair Representations for Kernel Models AISTATS 2020 Smoothed Geometry for Robust Attribution NIPS 2020 Influence Paths for Characterizing Subject-Verb Number Agreement in LSTM Language Models ACL 2020 Individual Fairness Revisited: Transferring Techniques from Adversarial Robustness IJCAI 2020 Feature-Wise Bias Amplification ICLR 2019 Hunting for Discriminatory Proxies in Linear Regression Models NIPS 2018