Alexander Pan
5 papers · 2022–2024 · 3 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+1 more ↓ Show less ↑
🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌍 Conference Polyglot (3) 🐝 Cross-Pollinator (9) 👥 Mega-Team (46)
❓
The Questioner
(2)
Conferences
ICML (3)
ICLR (1)
NIPS (1)
Top co-authors
Papers
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
NIPS 2024
The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning
ICML 2024
Feedback Loops With Language Models Drive In-Context Reward Hacking
ICML 2024
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the Machiavelli Benchmark
ICML 2023
The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models
ICLR 2022