Stephen Casper
7 papers · 2020–2026 · 4 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+5 more ↓ Show less ↑
π£ Hot Topic Early Bird π Conference Polyglot (4) π Academic Marathon (5) π Renaissance Researcher (5) π Interdisciplinary Bridge
π
Cross-Pollinator
(15)
πΊοΈ
Taxonomy Completionist
(21)
π§
Keyword Pioneer
π§¬
Topic Evolution
β
The Questioner
(2)
Conferences
AAAI (2)
EMNLP (2)
NIPS (2)
ACL (1)
Top co-authors
Keywords
neural network
(3)
adversarial attack
(3)
representation learning
(3)
model debugging
(2)
model safety
(1)
probing analysis
(1)
latent representation
(1)
deep neural network
(1)
implicit regularization
(1)
language model
(1)
out-of-distribution detection
(1)
jailbreak attack
(1)
red teaming
(1)
feature probing
(1)
model diagnosis
(1)
network width
(1)
few-shot prompting
(1)
internal representation
(1)
dialog system
(1)
open-domain dialog
(1)
Papers
STACK: Adversarial Attacks on LLM Safeguard Pipelines
AAAI 2026
What Features in Prompts Jailbreak LLMs? Investigating the Mechanisms Behind Attacks
EMNLP 2025
Red Teaming Deep Neural Networks with Feature Synthesis Tools
NIPS 2023
Cognitive Dissonance: Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness?
EMNLP 2023
Robust Feature-Level Adversaries are Interpretability Tools
NIPS 2022
Frivolous Units: Wider Networks Are Not Really That Wide
AAAI 2021
Probing Neural Dialog Models for Conversational Understanding
ACL 2020