conftrace_

Adam Gleave

12 papers · 2016–2026 · 6 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+3 more ↓

🗺️ Taxonomy Completionist (14) 🧭 Keyword Pioneer 🐝 Cross-Pollinator (8) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (6)

🏃 Academic Marathon (9) 💎 Century Club (11) ❓ The Questioner

Conferences

AAAI (3) ICLR (3) ICML (3) EMNLP (1) JMLR (1) OSDI (1)

Top co-authors

Tom Tseng (5) Kellin Pelrine (4) Stuart Russell (4) Michael D Dennis (2) Tony Tong Wang (2) Oskar John Hollinsworth (2) Sergey Levine (2) Joar Max Viktor Skalse (2) Brendan Murphy (2) Aaron David Tucker (2)

Research topics

Keywords

adversarial attack (3) large language model (2) data poisoning (1) game artificial intelligence (1) adversarial training (1) harmful content (1) game playing (1) partial identifiability (1) worst-case performance (1) reward function (1) reward learning (1) backdoor attack (1) safety alignment (1) expert demonstration (1) game theory (1) zero-shot transfer (1) model scaling (1) red teaming (1) fine-tuning attack (1) adversarial robustness (1)

Papers

STACK: Adversarial Attacks on LLM Safeguard Pipelines AAAI 2026 Can Go AIs Be Adversarially Robust? AAAI 2025 Jailbreak-Tuning: Models Efficiently Learn Jailbreak Susceptibility EMNLP 2025 Scaling Trends for Data Poisoning in LLMs AAAI 2025 Scaling Trends in Language Model Robustness ICML 2025 STARC: A General Framework For Quantifying Differences Between Reward Functions ICLR 2024 Invariance in Policy Optimisation and Partial Identifiability in Reward Learning ICML 2023 Adversarial Policies Beat Superhuman Go AIs ICML 2023 Quantifying Differences in Reward Functions ICLR 2021 Stable-Baselines3: Reliable Reinforcement Learning Implementations JMLR 2021 Adversarial Policies: Attacking Deep Reinforcement Learning ICLR 2020 Firmament: Fast, Centralized Cluster Scheduling at Scale OSDI 2016