Matt Fredrikson
21 papers · 2018–2026 · 6 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+8 more ↓ Show less ↑
π Academic Marathon (7) π§ Keyword Pioneer π Interdisciplinary Bridge π Conference Polyglot (6) π Cross-Pollinator (11)
π
Academic Marathon
(7)
πΊοΈ
Taxonomy Completionist
(22)
π
Cross-Pollinator
(11)
π
Triple Crown
π§¬
Topic Evolution
π₯
Unstoppable
(8)
ποΈ
Keyword Collector
(73)
π
Century Club
(20)
Conferences
ICLR (8)
NIPS (7)
ACL (2)
ICML (2)
AISTATS (1)
IJCAI (1)
Top co-authors
Keywords
adversarial robustness
(5)
adversarial attack
(3)
neural network
(2)
large language model
(2)
jailbreak attack
(2)
certified robustness
(2)
adversarial example
(2)
symbolic reasoning
(1)
algorithmic fairness
(1)
adversarial learning
(1)
ai safety
(1)
constrained optimization
(1)
safety alignment
(1)
model alignment
(1)
feature attribution
(1)
sparse optimization
(1)
neural network interpretability
(1)
linear regression
(1)
gradient descent
(1)
adversarial training
(1)
Papers
Jailbreak-Zero: A Path to Pareto Optimal Red Teaming for Large Language Models
ACL 2026
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
ICLR 2025
Aligned LLMs Are Not Aligned Browser Agents
ICLR 2025
Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization
NIPS 2024
Improving Alignment and Robustness with Circuit Breakers
NIPS 2024
A Recipe for Improved Certifiable Robustness
ICLR 2024
Grounding Neural Inference with Satisfiability Modulo Theories
NIPS 2023
Unlocking Deterministic Robustness Certification on ImageNet
NIPS 2023
On the Perils of Cascading Robust Classifiers
ICLR 2023
Selective Ensembles for Consistent Predictions
ICLR 2022
Robust Models Are More Interpretable Because Attributions Look Normal
ICML 2022
Consistent Counterfactuals for Deep Models
ICLR 2022
Fast Geometric Projections for Local Robustness Certification
ICLR 2021
Relaxing Local Robustness
NIPS 2021
Globally-Robust Neural Networks
ICML 2021
Learning Fair Representations for Kernel Models
AISTATS 2020
Smoothed Geometry for Robust Attribution
NIPS 2020
Influence Paths for Characterizing Subject-Verb Number Agreement in LSTM Language Models
ACL 2020
Individual Fairness Revisited: Transferring Techniques from Adversarial Robustness
IJCAI 2020
Feature-Wise Bias Amplification
ICLR 2019
Hunting for Discriminatory Proxies in Linear Regression Models
NIPS 2018