Peter Henderson
24 papers · 2018–2025 · 10 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+9 more ↓ Show less ↑
π Conference Polyglot (10) π Interdisciplinary Bridge π§ Keyword Pioneer π£ Hot Topic Early Bird π Academic Marathon (7)
π
Renaissance Researcher
(6)
πΊοΈ
Taxonomy Completionist
(31)
π
Conference Polyglot
(10)
π
Triple Crown
π
Grand Slam
π₯
Mega-Team
(40)
π
Century Club
(24)
β‘
Prolific Year
(5)
ποΈ
Keyword Collector
(70)
Conferences
ICLR (5)
ICML (5)
NIPS (4)
AAAI (3)
JMLR (2)
AACL (1)
CORL (1)
EMNLP (1)
IJCNLP (1)
NAACL (1)
Top co-authors
Keywords
large language model
(4)
population estimation
(2)
multi-armed bandit
(2)
reward estimation
(2)
value function
(2)
bias detection
(2)
reinforcement learning
(2)
dataset analysis
(2)
legal reasoning
(2)
model evaluation
(2)
transfer learning
(1)
text representation
(1)
policy optimization
(1)
model behavior
(1)
temporal difference learning
(1)
kl divergence
(1)
autoregressive transformer
(1)
energy efficiency
(1)
instruction tuning
(1)
intellectual property
(1)
Papers
LawInstruct: A Resource for Studying Language Model Adaptation to the Legal Domain
NAACL 2025
On Evaluating the Durability of Safeguards for Open-Weight LLMs
ICLR 2025
Fantastic Copyrighted Beasts and How (Not) to Generate Them
ICLR 2025
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
ICLR 2025
Safety Alignment Should be Made More Than Just a Few Tokens Deep
ICLR 2025
Position: In-House Evaluation Is Not Enough. Towards Robust Third-Party Evaluation and Flaw Disclosure for General-Purpose AI
ICML 2025
Position: On the Societal Impact of Open Foundation Models
ICML 2024
Visual Adversarial Examples Jailbreak Aligned Large Language Models
AAAI 2024
Evaluating Copyright Takedown Methods for Language Models
NIPS 2024
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
ICML 2024
Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
ICLR 2024
Position: A Safe Harbor for AI Evaluation and Red Teaming
ICML 2024
Entropy Regularization for Population Estimation
AAAI 2023
Foundation Models and Fair Use
JMLR 2023
LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models
NIPS 2023
Cheaply Estimating Inference Efficiency Metrics for Autoregressive Transformer Models
NIPS 2023
Integrating Reward Maximization and Population Estimation: Sequential Decision-Making for Internal Revenue Service Audit Selection
AAAI 2023
Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset
NIPS 2022
Text Characterization Toolkit (TCT)
IJCNLP 2022
Text Characterization Toolkit (TCT)
AACL 2022
Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning
JMLR 2020
With Little Power Comes Great Responsibility
EMNLP 2020
Separating value functions across time-scales
ICML 2019
Reward Estimation for Variance Reduction in Deep Reinforcement Learning
CORL 2018