conftrace_

Zhengxuan Wu

24 papers · 2020–2025 · 9 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓
+9 more ↓ πŸƒ Academic Marathon (5) 🌍 Conference Polyglot (9) πŸŒ‰ Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (13)
🌈 Renaissance Researcher (5) πŸ—ΊοΈ Taxonomy Completionist (44) πŸŒ‰ Interdisciplinary Bridge 🀝 Dynamic Duo (18) ❓ The Questioner ⚑ Prolific Year (5) πŸ—ƒοΈ Keyword Collector (93) πŸ”₯ Unstoppable (6) πŸ’Ž Century Club (24)

Conferences

EMNLP (5) ACL (4) ICML (4) NIPS (4) NAACL (3) AAAI (1) CLEAR (1) IJCNLP (1) JMLR (1)

Papers

Causal Abstraction: A Theoretical Foundation for Mechanistic Interpretability JMLR 2025 AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders ICML 2025 Dancing in Chains: Reconciling Instruction Following and Faithfulness in Language Models EMNLP 2024 ReFT: Representation Finetuning for Language Models NIPS 2024 In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation ICML 2024 RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations ACL 2024 pyvene: A Library for Understanding and Improving PyTorch Models via Interventions NAACL 2024 Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations CLEAR 2024 MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions EMNLP 2023 Interpretability at Scale: Identifying Causal Mechanisms in Alpaca NIPS 2023 Inducing Character-level Structure in Subword-based Language Models with Type-level Interchange Intervention Training ACL 2023 Oolong: Investigating What Makes Transfer Learning Hard with Controlled Studies EMNLP 2023 Rigorously Assessing Natural Language Explanations of Neurons EMNLP 2023 Causal Proxy Models for Concept-based Model Explanations ICML 2023 Causal Distillation for Language Models NAACL 2022 CEBaB: Estimating the Causal Effects of Real-World Concepts on NLP Model Behavior NIPS 2022 Identifying the Limits of Cross-Domain Knowledge Transfer for Pretrained Models ACL 2022 ZeroC: A Neuro-Symbolic Model for Zero-shot Concept Recognition and Acquisition at Inference Time NIPS 2022 Inducing Causal Structure for Interpretable Neural Networks ICML 2022 DynaSent: A Dynamic Benchmark for Sentiment Analysis IJCNLP 2021 DynaSent: A Dynamic Benchmark for Sentiment Analysis ACL 2021 Context-Guided BERT for Targeted Aspect-Based Sentiment Analysis AAAI 2021 Dynabench: Rethinking Benchmarking in NLP NAACL 2021 Structured Self-Attention Weights Encode Semantics in Sentiment Analysis EMNLP 2020