Mengnan Du

49 papers · 2021–2026 · 12 conferences · across top CS/AI conferences

Achievements

+14 more ↓

🐝 Cross-Pollinator (13) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌍 Conference Polyglot (12) 🏃 Academic Marathon (5)

🏃 Academic Marathon (5) 🌈 Renaissance Researcher (5) 🗺️ Taxonomy Completionist (51) 🏆 Keyword Champion (7) 🤝 Dynamic Duo (10) 🏆 Grand Slam 🧬 Topic Evolution 🔬 Deep Specialist (17) ❓ The Questioner 🔥 Unstoppable (5) 🗃️ Keyword Collector (153) ⚡ Prolific Year (10) 📈 Trend Setter 💎 Century Club (39)

Conferences

ACL (8) EMNLP (8) AAAI (6) EACL (5) ICML (5) ICLR (4) COLING (3) NAACL (3) NIPS (3) ACML (2) CVPR (1) IJCAI (1)

Top co-authors

Xia Hu (10) Haiyan Zhao (10) Dong Shu (9) Mingyu Jin (9) Fan Yang (9) Ninghao Liu (8) Ruixiang Tang (6) Yongfeng Zhang (6) Xuansheng Wu (5) Yu-Neng Chuang (4)

Research topics

Resources & Methods (1) Privacy (1)

Keywords

large language model (14) sparse autoencoder (7) deep neural network (4) spurious correlation (3) steering vector (3) backdoor attack (3) neural network (3) model steering (3) out-of-distribution generalization (3) feature extraction (3) language model (3) pre-trained language model (2) benchmark evaluation (2) model explanation (2) demonstration selection (2) representation learning (2) latent representation (2) chain-of-thought reasoning (2) shortcut learning (2) in-context learning (2)

Papers

KnowThyself: An Agentic Assistant for LLM Interpretability AAAI 2026 SAGE: An Agentic Explainer Framework for Interpreting SAE Features in Language Models EACL 2026 FaithLM: Towards Faithful Explanations for Large Language Models EACL 2026 LLM Agents in Law: Taxonomy, Applications, and Challenges ACL 2026 FinChart-Bench: Benchmarking Financial Chart Comprehension in Vision-Language Models ACL 2026 FinCall-Surprise: A Large Scale Multi-modal Benchmark for Earning Surprise Prediction ACL 2026 AdaJudge: Adaptive Multi-Perspective Judging for Reward Modeling ACL 2026 Fine-Grained Interpretation of Political Opinions in Large Language Models AAAI 2026 DeepSieve: Information Sieving via LLM-as-a-Knowledge-Router EACL 2026 Denoising Concept Vectors with Sparse Autoencoders for Improved Language Model Steering EACL 2026 Exploring Multilingual Probing in Large Language Models: A Cross-Language Analysis ACL 2025 Improving LLM Reasoning through Interpretable Role-Playing Steering EMNLP 2025 A Survey on Sparse Autoencoders: Interpreting the Internal Mechanisms of Large Language Models EMNLP 2025 Language Ranker: A Metric for Quantifying LLM Performance Across High and Low-Resource Languages AAAI 2025 Comparative Analysis of Demonstration Selection Algorithms for In-Context Learning in Large Language Models (Student Abstract) AAAI 2025 Exploring Concept Depth: How Large Language Models Acquire Knowledge and Concept at Different Layers? COLING 2025 Invisible Backdoor Attack against Self-supervised Learning CVPR 2025 Large Vision-Language Model Alignment and Misalignment: A Survey Through the Lens of Explainability EMNLP 2025 From Commands to Prompts: LLM-based Semantic File System for AIOS ICLR 2025 Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution ICLR 2025 ContractEval: Benchmarking LLMs for Clause-Level Legal Risk Identification in Commercial Contracts EMNLP 2025 Beyond Input Activations: Identifying Influential Latents by Gradient Sparse Autoencoders EMNLP 2025 SAE-SSV: Supervised Steering in Sparse Representation Spaces for Reliable Control of Language Models EMNLP 2025 Massive Values in Self-Attention Modules are the Key to Contextual Knowledge Understanding ICML 2025 Concept-Centric Token Interpretation for Vector-Quantized Generative Models ICML 2025 Data-centric NLP Backdoor Defense from the Lens of Memorization NAACL 2025 Feature Extraction and Steering for Enhanced Chain-of-Thought Reasoning in Language Models EMNLP 2025 Strategic Demonstration Selection for Improved Fairness in LLM In-Context Learning EMNLP 2024 The Impact of Reasoning Step Length on Large Language Models ACL 2024 Data-Centric Explainable Debiasing for Improving Fairness in Pre-trained Language Models ACL 2024 Knowledge Graph Large Language Model (KG-LLM) for Link Prediction ACML 2024 DataFrame QA: A Universal LLM Framework on DataFrame Question Answering Without Data Exposure ACML 2024 Mitigating Shortcuts in Language Models with Soft Label Encoding COLING 2024 Unveiling Project-Specific Bias in Neural Code Models COLING 2024 Explaining Time Series via Contrastive and Locally Sparse Perturbations ICLR 2024 TVE: Learning Meta-attribution for Transferable Vision Explainer ICML 2024 Secure Your Model: An Effective Key Prompt Protection Mechanism for Large Language Models NAACL 2024 $\mathcal{M}^4$: A Unified XAI Benchmark for Faithfulness Evaluation of Feature Attribution Methods across Metrics, Modalities and Models NIPS 2023 Fairness via Group Contribution Matching IJCAI 2023 FAIRER: Fairness as Decision Rationale Alignment ICML 2023 Robustness Challenges in Model Distillation and Pruning for Natural Language Understanding EACL 2023 Prompt Tuning Pushes Farther, Contrastive Learning Pulls Closer: A Two-Stage Approach to Mitigate Social Biases ACL 2023 Black-box Backdoor Defense via Zero-shot Image Purification NIPS 2023 Towards Debiasing DNN Models from Spurious Feature Influence AAAI 2022 Accelerating Shapley Explanation via Contributive Cooperator Selection ICML 2022 DEGREE: Decomposition Based Explanation for Graph Neural Networks ICLR 2022 A Unified Taylor Framework for Revisiting Attribution Methods AAAI 2021 Towards Interpreting and Mitigating Shortcut Learning Behavior of NLU models NAACL 2021 Fairness via Representation Neutralization NIPS 2021