conftrace_

Justin Wang

5 papers · 2024–2025 · 3 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌍 Conference Polyglot (3) 🐝 Cross-Pollinator (11) 🗺️ Taxonomy Completionist (10)

Conferences

ACL (2) ICLR (2) NIPS (1)

Top co-authors

Andy Zou (3) Dan Hendrycks (3) Maxwell Lin (3) Maksym Andriushchenko (2) Dylan Zhang (2) Derek Duenas (2) Rowan Wang (2) Matt Fredrikson (2) Long Phan (2) Alexandra Souly (1)

Keywords

adversarial robustness (1) ai safety (1) instruction tuning (1) model alignment (1) adversarial attack (1) language model (1) synthetic datum (1) representation engineering (1) circuit breaker (1) multimodal language model (1) ai alignment (1) data scarcity (1) instruction generalization (1) large language model (1) data diversification (1) harmful output (1) unseen semantics (1) proof-oriented programming (1) proof repair (1)

Papers

Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o Under Data Scarcity ACL 2025 Diversification Catalyzes Language Models’ Instruction Generalization To Unseen Semantics ACL 2025 Tamper-Resistant Safeguards for Open-Weight LLMs ICLR 2025 AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents ICLR 2025 Improving Alignment and Robustness with Circuit Breakers NIPS 2024