conftrace_

Katherine Metcalf

11 papers · 2019–2025 · 8 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+6 more ↓

🌍 Conference Polyglot (8) 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🌉 Interdisciplinary Bridge 🏃 Academic Marathon (6)

🐝 Cross-Pollinator (14) 🌈 Renaissance Researcher (5) 🌍 Conference Polyglot (8) 💎 Century Club (11) 📈 Trend Setter ❓ The Questioner (2)

Conferences

ICML (3) ACL (2) AAAI (1) CORL (1) EMNLP (1) ICLR (1) IJCAI (1) INTERSPEECH (1)

Top co-authors

Barry-John Theobald (7) Nicholas Apostoloff (3) Natalie Mackraz (3) Skyler Seto (2) Maartje Ter Hoeve (2) Yizhe Zhang (2) Miguel Sarabia (2) Luca Zappella (2) Masha Fedzechkina (2) Falaah Arif Khan (1)

Keywords

policy learning (2) preference-based reinforcement learning (2) representation learning (2) hierarchical learning (1) out-of-distribution generalization (1) direct preference optimization (1) embedding space (1) human feedback (1) language model alignment (1) parameter-efficient learning (1) temporal abstraction (1) cross-lingual alignment (1) reinforcement learning from human feedback (1) parameter efficient (1) reward function (1) self-supervised learning (1) sample efficiency (1) generalization capability (1) reward model (1) deep reinforcement learning (1)

Papers

Is Your Model Fairly Certain? Uncertainty-Aware Fairness Evaluation for LLMs ICML 2025 Steering into New Embedding Spaces: Analyzing Cross-Lingual Alignment Induced by Model Interventions in Multilingual Language Models ACL 2025 On the Way to LLM Personalization: Learning to Remember User Conversations ACL 2025 Aligning LLMs by Predicting Preferences from User Writing Samples ICML 2025 On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization EMNLP 2024 Can You Rely on Synthetic Labellers in Preference-Based Reinforcement Learning? It’s Complicated AAAI 2024 Hindsight PRIORs for Reward Learning from Human Preferences ICLR 2024 Whispering Experts: Neural Interventions for Toxicity Mitigation in Language Models ICML 2024 Sample-Efficient Preference-based Reinforcement Learning with Dynamics Aware Rewards CORL 2023 Unsupervised Hierarchical Temporal Abstraction by Simultaneously Learning Expectations and Representations IJCAI 2019 Mirroring to Build Trust in Digital Assistants INTERSPEECH 2019