Tomasz Korbak
10 papers · 2021–2024 · 4 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+4 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (27) π§ Keyword Pioneer π£ Hot Topic Early Bird π Renaissance Researcher (6) π Interdisciplinary Bridge
π
Conference Polyglot
(4)
π
Cross-Pollinator
(14)
π₯
Mega-Team
(34)
π
Century Club
(10)
Conferences
ICLR (3)
ICML (3)
NIPS (3)
EMNLP (1)
Top co-authors
Keywords
language model
(4)
reinforcement learning
(3)
kl divergence
(2)
distribution matching
(2)
large language model
(2)
policy gradient
(2)
language model fine-tuning
(2)
reinforcement learning from human feedback
(2)
bayesian inference
(1)
imitation learning
(1)
language model alignment
(1)
signaling games
(1)
conditional generation
(1)
transfer learning
(1)
generative model
(1)
preference alignment
(1)
distribution learning
(1)
inductive bia
(1)
reward maximization
(1)
self-supervised learning
(1)
Papers
The Reversal Curse: LLMs trained on βA is Bβ fail to learn βB is Aβ
ICLR 2024
Compositional Preference Models for Aligning LMs
ICLR 2024
Many-shot Jailbreaking
NIPS 2024
Towards Understanding Sycophancy in Language Models
ICLR 2024
Pretraining Language Models with Human Preferences
ICML 2023
Aligning Language Models with Preferences through $f$-divergence Minimization
ICML 2023
On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic Forgetting
NIPS 2022
Controlling Conditional Language Models without Catastrophic Forgetting
ICML 2022
RL with KL penalties is better viewed as Bayesian inference
EMNLP 2022
Catalytic Role Of Noise And Necessity Of Inductive Biases In The Emergence Of Compositional Communication
NIPS 2021