Justin Wang
5 papers · 2024–2025 · 3 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓
๐
Interdisciplinary Bridge
๐งญ
Keyword Pioneer
๐
Conference Polyglot
(3)
๐
Cross-Pollinator
(11)
๐บ๏ธ
Taxonomy Completionist
(10)
Conferences
ACL (2)
ICLR (2)
NIPS (1)
Top co-authors
Keywords
adversarial robustness
(1)
ai safety
(1)
instruction tuning
(1)
model alignment
(1)
adversarial attack
(1)
language model
(1)
synthetic datum
(1)
representation engineering
(1)
circuit breaker
(1)
multimodal language model
(1)
ai alignment
(1)
data scarcity
(1)
instruction generalization
(1)
large language model
(1)
data diversification
(1)
harmful output
(1)
unseen semantics
(1)
proof-oriented programming
(1)
proof repair
(1)
Papers
Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o Under Data Scarcity
ACL 2025
Diversification Catalyzes Language Modelsโ Instruction Generalization To Unseen Semantics
ACL 2025
Tamper-Resistant Safeguards for Open-Weight LLMs
ICLR 2025
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
ICLR 2025
Improving Alignment and Robustness with Circuit Breakers
NIPS 2024