conftrace_

Olivia Watkins

6 papers · 2021–2024 · 3 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

🌍 Conference Polyglot (3) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (15) 👑 Triple Crown

Conferences

NIPS (3) ICML (2) ICLR (1)

Top co-authors

Pieter Abbeel (6) Yuqing Du (3) Trevor Darrell (3) Abhishek Gupta (2) Sam Toyer (2) Jacob Andreas (2) Justin Svegliato (2) Jessy Lin (1) Moonkyung Ryu (1) Luke Bailey (1)

Keywords

reinforcement learning (3) policy gradient (1) policy learning (1) text-to-image generation (1) reward function (1) intrinsic motivation (1) diffusion model (1) language model (1) exploration bonus (1) safety fine-tuning (1) attack success rate (1) goal generation (1) interactive feedback (1) harmfulness evaluation (1) advice distillation (1) jailbreak benchmark (1)

Papers

A StrongREJECT for Empty Jailbreaks NIPS 2024 Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game ICLR 2024 Learning to Model the World With Language ICML 2024 DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models NIPS 2023 Guiding Pretraining in Reinforcement Learning with Large Language Models ICML 2023 Teachable Reinforcement Learning via Advice Distillation NIPS 2021