Olivia Watkins
6 papers · 2021–2024 · 3 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓
🌍
Conference Polyglot
(3)
🌉
Interdisciplinary Bridge
🧭
Keyword Pioneer
🐝
Cross-Pollinator
(15)
👑
Triple Crown
Conferences
NIPS (3)
ICML (2)
ICLR (1)
Top co-authors
Keywords
reinforcement learning
(3)
policy gradient
(1)
policy learning
(1)
text-to-image generation
(1)
reward function
(1)
intrinsic motivation
(1)
diffusion model
(1)
language model
(1)
exploration bonus
(1)
safety fine-tuning
(1)
attack success rate
(1)
goal generation
(1)
interactive feedback
(1)
harmfulness evaluation
(1)
advice distillation
(1)
jailbreak benchmark
(1)
Papers
A StrongREJECT for Empty Jailbreaks
NIPS 2024
Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game
ICLR 2024
Learning to Model the World With Language
ICML 2024
DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models
NIPS 2023
Guiding Pretraining in Reinforcement Learning with Large Language Models
ICML 2023
Teachable Reinforcement Learning via Advice Distillation
NIPS 2021