conftrace_

Stephen Casper

7 papers · 2020–2026 · 4 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+5 more ↓

🐣 Hot Topic Early Bird 🌍 Conference Polyglot (4) 🏃 Academic Marathon (5) 🌈 Renaissance Researcher (5) 🌉 Interdisciplinary Bridge

🐝 Cross-Pollinator (15) 🗺️ Taxonomy Completionist (21) 🧭 Keyword Pioneer 🧬 Topic Evolution ❓ The Questioner (2)

Conferences

AAAI (2) EMNLP (2) NIPS (2) ACL (1)

Top co-authors

Dylan Hadfield-Menell (3) Gabriel Kreiman (2) Xavier Boix (1) Stuart Shieber (1) Robert Kirk (1) Kevin Zhang (1) Tong Bu (1) Kaivalya Hariharan (1) Xander Davies (1) Adam Gleave (1)

Keywords

neural network (3) adversarial attack (3) representation learning (3) model debugging (2) model safety (1) probing analysis (1) latent representation (1) deep neural network (1) implicit regularization (1) language model (1) out-of-distribution detection (1) jailbreak attack (1) red teaming (1) feature probing (1) model diagnosis (1) network width (1) few-shot prompting (1) internal representation (1) dialog system (1) open-domain dialog (1)

Papers

STACK: Adversarial Attacks on LLM Safeguard Pipelines AAAI 2026 What Features in Prompts Jailbreak LLMs? Investigating the Mechanisms Behind Attacks EMNLP 2025 Red Teaming Deep Neural Networks with Feature Synthesis Tools NIPS 2023 Cognitive Dissonance: Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness? EMNLP 2023 Robust Feature-Level Adversaries are Interpretability Tools NIPS 2022 Frivolous Units: Wider Networks Are Not Really That Wide AAAI 2021 Probing Neural Dialog Models for Conversational Understanding ACL 2020