Katherine Metcalf
11 papers · 2019–2025 · 8 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+6 more ↓ Show less ↑
π Conference Polyglot (8) π§ Keyword Pioneer π£ Hot Topic Early Bird π Interdisciplinary Bridge π Academic Marathon (6)
π
Cross-Pollinator
(14)
π
Renaissance Researcher
(5)
π
Conference Polyglot
(8)
π
Century Club
(11)
π
Trend Setter
β
The Questioner
(2)
Conferences
ICML (3)
ACL (2)
AAAI (1)
CORL (1)
EMNLP (1)
ICLR (1)
IJCAI (1)
INTERSPEECH (1)
Top co-authors
Keywords
policy learning
(2)
preference-based reinforcement learning
(2)
representation learning
(2)
hierarchical learning
(1)
out-of-distribution generalization
(1)
direct preference optimization
(1)
embedding space
(1)
human feedback
(1)
language model alignment
(1)
parameter-efficient learning
(1)
temporal abstraction
(1)
cross-lingual alignment
(1)
reinforcement learning from human feedback
(1)
parameter efficient
(1)
reward function
(1)
self-supervised learning
(1)
sample efficiency
(1)
generalization capability
(1)
reward model
(1)
deep reinforcement learning
(1)
Papers
Is Your Model Fairly Certain? Uncertainty-Aware Fairness Evaluation for LLMs
ICML 2025
Steering into New Embedding Spaces: Analyzing Cross-Lingual Alignment Induced by Model Interventions in Multilingual Language Models
ACL 2025
On the Way to LLM Personalization: Learning to Remember User Conversations
ACL 2025
Aligning LLMs by Predicting Preferences from User Writing Samples
ICML 2025
On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization
EMNLP 2024
Can You Rely on Synthetic Labellers in Preference-Based Reinforcement Learning? Itβs Complicated
AAAI 2024
Hindsight PRIORs for Reward Learning from Human Preferences
ICLR 2024
Whispering Experts: Neural Interventions for Toxicity Mitigation in Language Models
ICML 2024
Sample-Efficient Preference-based Reinforcement Learning with Dynamics Aware Rewards
CORL 2023
Unsupervised Hierarchical Temporal Abstraction by Simultaneously Learning Expectations and Representations
IJCAI 2019
Mirroring to Build Trust in Digital Assistants
INTERSPEECH 2019