conftrace_

Boyi Wei

4 papers · 2024–2025 · 3 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

🌍 Conference Polyglot (3) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (15)

Conferences

ICLR (2) ICML (1) NIPS (1)

Top co-authors

Yangsibo Huang (4) Peter Henderson (4) Prateek Mittal (3) Xiangyu Qi (3) Tinghao Xie (3) Luxi He (2) Kai Li (2) Kaixuan Huang (2) Dacheng Li (1) Mengdi Wang (1)

Keywords

large language model (1) copyright takedown (1) decoding-time intervention (1)

Papers

On Evaluating the Durability of Safeguards for Open-Weight LLMs ICLR 2025 SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal ICLR 2025 Evaluating Copyright Takedown Methods for Language Models NIPS 2024 Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications ICML 2024