Rohin Shah
11 papers · 2019–2025 · 4 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+5 more ↓ Show less ↑
π£ Hot Topic Early Bird π§ Keyword Pioneer π Interdisciplinary Bridge πΊοΈ Taxonomy Completionist (19) π Conference Polyglot (4)
π
Academic Marathon
(6)
π
Cross-Pollinator
(4)
π
Renaissance Researcher
(5)
π
Triple Crown
π
Century Club
(11)
Conferences
NIPS (6)
ICLR (2)
ICML (2)
EMNLP (1)
Top co-authors
Keywords
reinforcement learning
(3)
ai safety
(2)
multi-agent system
(2)
imitation learning
(2)
sparse autoencoder
(2)
model analysis
(1)
reward function
(1)
neural network interpretability
(1)
markov decision process
(1)
behavior cloning
(1)
model evaluation
(1)
distribution shift
(1)
optimal policy
(1)
reward learning
(1)
latent representation
(1)
pairwise comparison
(1)
human feedback
(1)
benchmark dataset
(1)
language model
(1)
inverse reinforcement learning
(1)
Papers
MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking
ICML 2025
On scalable oversight with weak LLMs judging strong LLMs
NIPS 2024
Improving Sparse Decomposition of Language Model Activations with Gated Sparse Autoencoders
NIPS 2024
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
EMNLP 2024
BEDD: The MineRL BASALT Evaluation and Demonstrations Dataset for Training and Benchmarking Agents that Solve Fuzzy Tasks
NIPS 2023
Learning What To Do by Simulating the Past
ICLR 2021
Optimal Policies Tend To Seek Power
NIPS 2021
The MAGICAL Benchmark for Robust Imitation
NIPS 2020
Preferences Implicit in the State of the World
ICLR 2019
On the Utility of Learning about Humans for Human-AI Coordination
NIPS 2019
On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference
ICML 2019