conftrace_

Tomasz Korbak

10 papers · 2021–2024 · 4 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+4 more ↓

🗺️ Taxonomy Completionist (27) 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🌈 Renaissance Researcher (6) 🌉 Interdisciplinary Bridge

🌍 Conference Polyglot (4) 🐝 Cross-Pollinator (14) 👥 Mega-Team (34) 💎 Century Club (10)

Conferences

ICLR (3) ICML (3) NIPS (3) EMNLP (1)

Top co-authors

Germán Kruszewski (4) Marc Dymetman (4) Ethan Perez (4) Meg Tong (3) Samuel R. Bowman (3) Timothy Maxwell (2) Jos Rozen (2) Christopher Buckley (2) David Duvenaud (2) Esin Durmus (2)

Keywords

language model (4) reinforcement learning (3) kl divergence (2) distribution matching (2) large language model (2) policy gradient (2) language model fine-tuning (2) reinforcement learning from human feedback (2) bayesian inference (1) imitation learning (1) language model alignment (1) signaling games (1) conditional generation (1) transfer learning (1) generative model (1) preference alignment (1) distribution learning (1) inductive bia (1) reward maximization (1) self-supervised learning (1)

Papers

The Reversal Curse: LLMs trained on “A is B” fail to learn “B is A” ICLR 2024 Compositional Preference Models for Aligning LMs ICLR 2024 Many-shot Jailbreaking NIPS 2024 Towards Understanding Sycophancy in Language Models ICLR 2024 Pretraining Language Models with Human Preferences ICML 2023 Aligning Language Models with Preferences through $f$-divergence Minimization ICML 2023 On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic Forgetting NIPS 2022 Controlling Conditional Language Models without Catastrophic Forgetting ICML 2022 RL with KL penalties is better viewed as Bayesian inference EMNLP 2022 Catalytic Role Of Noise And Necessity Of Inductive Biases In The Emergence Of Compositional Communication NIPS 2021