Dan Hendrycks

29 papers · 2018–2025 · 9 conferences · across top CS/AI conferences

Achievements

+12 more ↓

🌍 Conference Polyglot (9) 🏃 Academic Marathon (7) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐣 Hot Topic Early Bird

🗺️ Taxonomy Completionist (57) 🧭 Keyword Pioneer 🐝 Cross-Pollinator (10) 🤝 Dynamic Duo (14) 👑 Triple Crown 🏆 Keyword Champion (2) 👥 Mega-Team (46) 🔥 Unstoppable (8) 💎 Century Club (29) ⚡ Prolific Year (7) ❓ The Questioner (3) 🗃️ Keyword Collector (80)

Conferences

NIPS (8) ICLR (7) ICML (7) CVPR (2) ACL (1) ECCV (1) EMNLP (1) ICCV (1) JMLR (1)

Top co-authors

Mantas Mazeika (14) Dawn Song (14) Andy Zou (11) Steven Basart (10) Jacob Steinhardt (8) Bo Li (7) Long Phan (4) Chulin Xie (3) Thomas Dietterich (3) Alice Gatti (3)

Keywords

out-of-distribution detection (6) anomaly detection (5) adversarial robustness (4) data augmentation (3) uncertainty estimation (3) question answering (2) open category detection (2) distribution shift (2) image classification (2) model robustness (2) adversarial example (2) deep neural network (2) benchmark evaluation (2) label noise (2) out-of-distribution robustness (2) ai safety (2) pac learning (2) named entity recognition (1) temporal reasoning (1) natural language processing (1)

Papers

MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models ICLR 2025 AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents ICLR 2025 Tamper-Resistant Safeguards for Open-Weight LLMs ICLR 2025 Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression ICML 2024 Improving Alignment and Robustness with Circuit Breakers NIPS 2024 HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal ICML 2024 The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning ICML 2024 Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress? NIPS 2024 DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models NIPS 2023 Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the Machiavelli Benchmark ICML 2023 MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement Understanding EMNLP 2023 Forecasting Future World Events With Neural Networks NIPS 2022 OpenOOD: Benchmarking Generalized Out-of-Distribution Detection NIPS 2022 PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures CVPR 2022 A Spectral View of Randomized Smoothing under Common Corruptions: Benchmarking and Improving Certified Robustness ECCV 2022 PAC Guarantees and Effective Algorithms for Detecting Novel Categories JMLR 2022 How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios NIPS 2022 Scaling Out-of-Distribution Detection for Real-World Settings ICML 2022 Measuring Massive Multitask Language Understanding ICLR 2021 The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization ICCV 2021 Natural Adversarial Examples CVPR 2021 Aligning AI With Shared Human Values ICLR 2021 Pretrained Transformers Improve Out-of-Distribution Robustness ACL 2020 Deep Anomaly Detection with Outlier Exposure ICLR 2019 Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty NIPS 2019 Benchmarking Neural Network Robustness to Common Corruptions and Perturbations ICLR 2019 Using Pre-Training Can Improve Model Robustness and Uncertainty ICML 2019 Open Category Detection with PAC Guarantees ICML 2018 Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise NIPS 2018