Zhengxuan Wu
24 papers · 2020–2025 · 9 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+9 more ↓ Show less ↑
π Academic Marathon (5) π Conference Polyglot (9) π Interdisciplinary Bridge π§ Keyword Pioneer π Cross-Pollinator (13)
π
Renaissance Researcher
(5)
πΊοΈ
Taxonomy Completionist
(44)
π
Interdisciplinary Bridge
π€
Dynamic Duo
(18)
β
The Questioner
β‘
Prolific Year
(5)
ποΈ
Keyword Collector
(93)
π₯
Unstoppable
(6)
π
Century Club
(24)
Conferences
EMNLP (5)
ACL (4)
ICML (4)
NIPS (4)
NAACL (3)
AAAI (1)
CLEAR (1)
IJCNLP (1)
JMLR (1)
Top co-authors
Keywords
causal abstraction
(6)
language model
(6)
causal inference
(5)
neural network
(4)
sentiment analysis
(4)
natural language processing
(3)
pretrained language model
(3)
large language model
(3)
transfer learning
(3)
representation learning
(3)
ternary classification
(2)
benchmark dataset
(2)
concept-based explanation
(2)
instruction tuning
(2)
model alignment
(2)
distributed representation
(2)
attention mechanism
(1)
dataset creation
(1)
few-shot learning
(1)
multi-task learning
(1)
Papers
Causal Abstraction: A Theoretical Foundation for Mechanistic Interpretability
JMLR 2025
AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders
ICML 2025
Dancing in Chains: Reconciling Instruction Following and Faithfulness in Language Models
EMNLP 2024
ReFT: Representation Finetuning for Language Models
NIPS 2024
In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation
ICML 2024
RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations
ACL 2024
pyvene: A Library for Understanding and Improving PyTorch Models via Interventions
NAACL 2024
Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
CLEAR 2024
MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
EMNLP 2023
Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
NIPS 2023
Inducing Character-level Structure in Subword-based Language Models with Type-level Interchange Intervention Training
ACL 2023
Oolong: Investigating What Makes Transfer Learning Hard with Controlled Studies
EMNLP 2023
Rigorously Assessing Natural Language Explanations of Neurons
EMNLP 2023
Causal Proxy Models for Concept-based Model Explanations
ICML 2023
Causal Distillation for Language Models
NAACL 2022
CEBaB: Estimating the Causal Effects of Real-World Concepts on NLP Model Behavior
NIPS 2022
Identifying the Limits of Cross-Domain Knowledge Transfer for Pretrained Models
ACL 2022
ZeroC: A Neuro-Symbolic Model for Zero-shot Concept Recognition and Acquisition at Inference Time
NIPS 2022
Inducing Causal Structure for Interpretable Neural Networks
ICML 2022
DynaSent: A Dynamic Benchmark for Sentiment Analysis
IJCNLP 2021
DynaSent: A Dynamic Benchmark for Sentiment Analysis
ACL 2021
Context-Guided BERT for Targeted Aspect-Based Sentiment Analysis
AAAI 2021
Dynabench: Rethinking Benchmarking in NLP
NAACL 2021
Structured Self-Attention Weights Encode Semantics in Sentiment Analysis
EMNLP 2020