conftrace_

Nathaniel Li

3 papers · 2023–2024 · 1 conference · across top CS/AI conferences

Achievements

Jump to papers ↓

🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (10) 👥 Mega-Team (46) ❓ The Questioner

Conferences

ICML (3)

Top co-authors

Steven Basart (3) Andy Zou (3) Dan Hendrycks (3) Mantas Mazeika (2) Zifan Wang (2) Alexander Pan (2) Long Phan (1) Adam Alfred Hunt (1) David Campbell (1) Uday Tupakula (1)

Keywords

reward optimization (1) ethical behavior (1) machine ethics (1)

Papers

The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning ICML 2024 HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal ICML 2024 Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the Machiavelli Benchmark ICML 2023