Mantas Mazeika

15 papers · 2018–2025 · 4 conferences · across top CS/AI conferences

Achievements

+11 more ↓

🌍 Conference Polyglot (4) 🏃 Academic Marathon (7) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (9)

🐝 Cross-Pollinator (9) 🌈 Renaissance Researcher (8) 🗺️ Taxonomy Completionist (40) 👥 Mega-Team (46) 👑 Triple Crown 🤝 Dynamic Duo (14) 🗃️ Keyword Collector (50) 💎 Century Club (15) 🔥 Unstoppable (5) ❓ The Questioner (2) ⚡ Prolific Year (5)

Conferences

NIPS (6) ICML (5) ICLR (3) CVPR (1)

Top co-authors

Dan Hendrycks (14) Andy Zou (8) Dawn Song (8) Steven Basart (6) Bo Li (5) Jacob Steinhardt (5) Long Phan (3) Alice Gatti (3) Rishub Tamirisa (2) David Forsyth (2)

Keywords

adversarial robustness (3) uncertainty estimation (3) out-of-distribution detection (3) anomaly detection (2) deep neural network (2) label noise (2) model robustness (1) uncertainty quantification (1) adversarial learning (1) question answering (1) toxicity detection (1) confidence calibration (1) video understanding (1) ai safety (1) temporal reasoning (1) event forecasting (1) spectral analysis (1) data augmentation (1) affective computing (1) self-supervised learning (1)

Papers

Tamper-Resistant Safeguards for Open-Weight LLMs ICLR 2025 Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress? NIPS 2024 The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning ICML 2024 HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal ICML 2024 DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models NIPS 2023 How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios NIPS 2022 How to Steer Your Adversary: Targeted and Efficient Model Stealing Defenses with Gradient Redirection ICML 2022 Forecasting Future World Events With Neural Networks NIPS 2022 PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures CVPR 2022 Scaling Out-of-Distribution Detection for Real-World Settings ICML 2022 Measuring Massive Multitask Language Understanding ICLR 2021 Deep Anomaly Detection with Outlier Exposure ICLR 2019 Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty NIPS 2019 Using Pre-Training Can Improve Model Robustness and Uncertainty ICML 2019 Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise NIPS 2018