Andy Zou
12 papers · 2021–2025 · 4 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+7 more ↓ Show less ↑
🐝 Cross-Pollinator (11) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌍 Conference Polyglot (4) 🌈 Renaissance Researcher (7)
🌍
Conference Polyglot
(4)
🌈
Renaissance Researcher
(7)
🤝
Dynamic Duo
(11)
👥
Mega-Team
(46)
🔥
Unstoppable
(5)
💎
Century Club
(12)
❓
The Questioner
(2)
Conferences
ICML (4)
NIPS (4)
ICLR (3)
CVPR (1)
Top co-authors
Keywords
adversarial robustness
(3)
anomaly detection
(2)
question answering
(1)
temporal reasoning
(1)
event forecasting
(1)
video understanding
(1)
ai safety
(1)
robustness certification
(1)
affective computing
(1)
model alignment
(1)
adversarial training
(1)
adversarial attack
(1)
deep neural network
(1)
language model
(1)
data augmentation
(1)
lipschitz constant
(1)
out-of-distribution detection
(1)
representation engineering
(1)
circuit breaker
(1)
image classification
(1)
Papers
Tamper-Resistant Safeguards for Open-Weight LLMs
ICLR 2025
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
ICLR 2025
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
ICML 2024
Improving Alignment and Robustness with Circuit Breakers
NIPS 2024
The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning
ICML 2024
Unlocking Deterministic Robustness Certification on ImageNet
NIPS 2023
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the Machiavelli Benchmark
ICML 2023
Scaling Out-of-Distribution Detection for Real-World Settings
ICML 2022
Forecasting Future World Events With Neural Networks
NIPS 2022
PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures
CVPR 2022
How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios
NIPS 2022
Measuring Massive Multitask Language Understanding
ICLR 2021