conftrace_

Anca Dragan

64 papers · 2012–2025 · 9 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓
+16 more ↓ 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird πŸ—ΊοΈ Taxonomy Completionist (15) πŸŒ‰ Interdisciplinary Bridge 🌍 Conference Polyglot (9)
πŸŒ‰ Interdisciplinary Bridge πŸƒ Academic Marathon (13) πŸ—ΊοΈ Taxonomy Completionist (15) 🌟 Keyword Trendsetter Combo (5) 🀝 Dynamic Duo (14) πŸ‘‘ Triple Crown πŸ† Keyword Champion (4) πŸ† Grand Slam πŸ”¬ Deep Specialist (13) πŸ—ƒοΈ Keyword Collector (55) πŸ“ˆ Trend Setter πŸ”₯ Unstoppable (10) πŸš€ Conference Pioneer ⚑ Prolific Year (12) πŸ’Ž Century Club (64) ❓ The Questioner (2)

Conferences

ICML (15) NIPS (14) ICLR (13) RSS (10) CORL (6) ACL (2) IJCAI (2) AAAI (1) EMNLP (1)

Papers

Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning ICLR 2025 AssistanceZero: Scalably Solving Assistance Games ICML 2025 Adversaries Can Misuse Combinations of Safe Models ICML 2025 On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback ICLR 2025 Context Steering: Controllable Personalization at Inference Time ICLR 2025 Correlated Proxies: A New Definition and Improved Mitigation for Reward Hacking ICLR 2025 Offline RL with Observation Histories: Analyzing and Improving Sample Complexity ICLR 2024 Learning Temporal Distances: Contrastive Successor Features Can Provide a Metric Structure for Decision-Making ICML 2024 Learning to Model the World With Language ICML 2024 AI Alignment with Changing and Influenceable Reward Functions ICML 2024 Learning to Assist Humans without Inferring Rewards NIPS 2024 When Your AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback NIPS 2024 Trajectory Improvement and Reward Learning from Comparative Language Feedback CORL 2024 Learning Optimal Advantage from Preferences and Mistaking It for Reward AAAI 2024 Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2 EMNLP 2024 Coprocessor Actor Critic: A Model-Based Reinforcement Learning Approach For Adaptive Brain Stimulation ICML 2024 The Effective Horizon Explains Deep RL Performance in Stochastic Environments ICLR 2024 Confronting Reward Model Overoptimization with Constrained RLHF ICLR 2024 Quantifying Assistive Robustness Via the Natural-Adversarial Frontier CORL 2023 On the Sensitivity of Reward Inference to Misspecified Human Models ICLR 2023 Causal Confusion and Reward Misidentification in Preference-Based Reward Learning ICLR 2023 Learning to Influence Human Behavior with Offline Reinforcement Learning NIPS 2023 Bridging RL Theory and Practice with the Effective Horizon NIPS 2023 Automatically Auditing Large Language Models via Discrete Optimization ICML 2023 Contextual Reliability: When Different Features Matter in Different Contexts ICML 2023 Goal Representations for Instruction Following: A Semi-Supervised Language Interface to Control CORL 2023 The Boltzmann Policy Distribution: Accounting for Systematic Suboptimality in Human Models ICLR 2022 Estimating and Penalizing Induced Preference Shifts in Recommender Systems ICML 2022 Inferring Rewards from Language in Context ACL 2022 Uni[MASK]: Unified Inference in Sequential Decision Problems NIPS 2022 First Contact: Unsupervised Human-Machine Co-Adaptation via Mutual Information Maximization NIPS 2022 Learning Representations that Enable Generalization in Assistive Tasks CORL 2022 On complementing end-to-end human behavior predictors with planning RSS 2021 Pragmatic Image Compression for Human-in-the-Loop Decision-Making NIPS 2021 Learning What To Do by Simulating the Past ICLR 2021 X2T: Training an X-to-Text Typing Interface with Online Learning from User Feedback ICLR 2021 Value Alignment Verification ICML 2021 Policy Gradient Bayesian Robust Optimization for Imitation Learning ICML 2021 AvE: Assistance via Empowerment NIPS 2020 Reward-rational (implicit) choice: A unifying formalism for reward learning NIPS 2020 Learning Human Objectives by Evaluating Hypothetical Behavior ICML 2020 Assisted Perception: Optimizing Observations to Communicate State CORL 2020 Preference learning along multiple criteria: A game-theoretic perspective NIPS 2020 Preferences Implicit in the State of the World ICLR 2019 On the Utility of Learning about Humans for Human-AI Coordination NIPS 2019 Learning a Prior over Intent via Meta-Inverse Reinforcement Learning ICML 2019 On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference ICML 2019 An Efficient, Generalized Bellman Update For Cooperative Inverse Reinforcement Learning ICML 2018 Probabilistically Safe Robot Planning with Confidence-Based Human Predictions RSS 2018 Where Do You Think You're Going?: Inferring Beliefs about Dynamics from Behavior NIPS 2018 Shared Autonomy via Deep Reinforcement Learning RSS 2018 Simplifying Reward Design through Divide-and-Conquer RSS 2018 Inverse Reward Design NIPS 2017 Should Robots be Obedient? IJCAI 2017 Active Preference-Based Learning of Reward Functions RSS 2017 Enabling Robots to Communicate Their Objectives RSS 2017 Translating Neuralese ACL 2017 DART: Noise Injection for Robust Imitation Learning CORL 2017 The Off-Switch Game IJCAI 2017 Functional Gradient Motion Planning in Reproducing Kernel Hilbert Spaces RSS 2016 Cooperative Inverse Reinforcement Learning NIPS 2016 An Analysis of Deceptive Robot Motion RSS 2014 Generating Legible Motion RSS 2013 Formalizing Assistive Teleoperation RSS 2012