Zhengxuan Wu

24 papers · 2020–2025 · 9 conferences · across top CS/AI conferences

Achievements

+9 more ↓

🏃 Academic Marathon (5) 🌍 Conference Polyglot (9) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (13)

🌈 Renaissance Researcher (5) 🗺️ Taxonomy Completionist (44) 🌉 Interdisciplinary Bridge 🤝 Dynamic Duo (18) ❓ The Questioner ⚡ Prolific Year (5) 🗃️ Keyword Collector (93) 🔥 Unstoppable (6) 💎 Century Club (24)

Conferences

EMNLP (5) ACL (4) ICML (4) NIPS (4) NAACL (3) AAAI (1) CLEAR (1) IJCNLP (1) JMLR (1)

Top co-authors

Christopher Potts (18) Atticus Geiger (15) Noah Goodman (6) Jing Huang (6) Thomas Icard (5) Aryaman Arora (4) Zheng Wang (3) Douwe Kiela (3) Karel D’Oosterlinck (2) Desmond C. Ong (2)

Keywords

causal abstraction (6) language model (6) causal inference (5) neural network (4) sentiment analysis (4) natural language processing (3) pretrained language model (3) large language model (3) transfer learning (3) representation learning (3) ternary classification (2) benchmark dataset (2) concept-based explanation (2) instruction tuning (2) model alignment (2) distributed representation (2) attention mechanism (1) dataset creation (1) few-shot learning (1) multi-task learning (1)

Papers

Causal Abstraction: A Theoretical Foundation for Mechanistic Interpretability JMLR 2025 AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders ICML 2025 Dancing in Chains: Reconciling Instruction Following and Faithfulness in Language Models EMNLP 2024 ReFT: Representation Finetuning for Language Models NIPS 2024 In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation ICML 2024 RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations ACL 2024 pyvene: A Library for Understanding and Improving PyTorch Models via Interventions NAACL 2024 Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations CLEAR 2024 MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions EMNLP 2023 Interpretability at Scale: Identifying Causal Mechanisms in Alpaca NIPS 2023 Inducing Character-level Structure in Subword-based Language Models with Type-level Interchange Intervention Training ACL 2023 Oolong: Investigating What Makes Transfer Learning Hard with Controlled Studies EMNLP 2023 Rigorously Assessing Natural Language Explanations of Neurons EMNLP 2023 Causal Proxy Models for Concept-based Model Explanations ICML 2023 Causal Distillation for Language Models NAACL 2022 CEBaB: Estimating the Causal Effects of Real-World Concepts on NLP Model Behavior NIPS 2022 Identifying the Limits of Cross-Domain Knowledge Transfer for Pretrained Models ACL 2022 ZeroC: A Neuro-Symbolic Model for Zero-shot Concept Recognition and Acquisition at Inference Time NIPS 2022 Inducing Causal Structure for Interpretable Neural Networks ICML 2022 DynaSent: A Dynamic Benchmark for Sentiment Analysis IJCNLP 2021 DynaSent: A Dynamic Benchmark for Sentiment Analysis ACL 2021 Context-Guided BERT for Targeted Aspect-Based Sentiment Analysis AAAI 2021 Dynabench: Rethinking Benchmarking in NLP NAACL 2021 Structured Self-Attention Weights Encode Semantics in Sentiment Analysis EMNLP 2020