Peter Henderson

24 papers · 2018–2025 · 10 conferences · across top CS/AI conferences

Achievements

+9 more ↓

🌍 Conference Polyglot (10) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🏃 Academic Marathon (7)

🌈 Renaissance Researcher (6) 🗺️ Taxonomy Completionist (31) 🌍 Conference Polyglot (10) 👑 Triple Crown 🏆 Grand Slam 👥 Mega-Team (40) 💎 Century Club (24) ⚡ Prolific Year (5) 🗃️ Keyword Collector (70)

Conferences

ICLR (5) ICML (5) NIPS (4) AAAI (3) JMLR (2) AACL (1) CORL (1) EMNLP (1) IJCNLP (1) NAACL (1)

Top co-authors

Percy Liang (6) Prateek Mittal (6) Yangsibo Huang (6) Xiangyu Qi (6) Tinghao Xie (5) Dan Jurafsky (4) Joelle Pineau (4) Boyi Wei (4) Daniel E. Ho (4) Rishi Bommasani (4)

Keywords

large language model (4) population estimation (2) multi-armed bandit (2) reward estimation (2) value function (2) bias detection (2) reinforcement learning (2) dataset analysis (2) legal reasoning (2) model evaluation (2) transfer learning (1) text representation (1) policy optimization (1) model behavior (1) temporal difference learning (1) kl divergence (1) autoregressive transformer (1) energy efficiency (1) instruction tuning (1) intellectual property (1)

Papers

LawInstruct: A Resource for Studying Language Model Adaptation to the Legal Domain NAACL 2025 On Evaluating the Durability of Safeguards for Open-Weight LLMs ICLR 2025 Fantastic Copyrighted Beasts and How (Not) to Generate Them ICLR 2025 SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal ICLR 2025 Safety Alignment Should be Made More Than Just a Few Tokens Deep ICLR 2025 Position: In-House Evaluation Is Not Enough. Towards Robust Third-Party Evaluation and Flaw Disclosure for General-Purpose AI ICML 2025 Position: On the Societal Impact of Open Foundation Models ICML 2024 Visual Adversarial Examples Jailbreak Aligned Large Language Models AAAI 2024 Evaluating Copyright Takedown Methods for Language Models NIPS 2024 Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications ICML 2024 Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To! ICLR 2024 Position: A Safe Harbor for AI Evaluation and Red Teaming ICML 2024 Entropy Regularization for Population Estimation AAAI 2023 Foundation Models and Fair Use JMLR 2023 LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models NIPS 2023 Cheaply Estimating Inference Efficiency Metrics for Autoregressive Transformer Models NIPS 2023 Integrating Reward Maximization and Population Estimation: Sequential Decision-Making for Internal Revenue Service Audit Selection AAAI 2023 Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset NIPS 2022 Text Characterization Toolkit (TCT) IJCNLP 2022 Text Characterization Toolkit (TCT) AACL 2022 Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning JMLR 2020 With Little Power Comes Great Responsibility EMNLP 2020 Separating value functions across time-scales ICML 2019 Reward Estimation for Variance Reduction in Deep Reinforcement Learning CORL 2018