Mengnan Du
49 papers · 2021–2026 · 12 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+14 more ↓ Show less ↑
π Cross-Pollinator (13) π Interdisciplinary Bridge π§ Keyword Pioneer π Conference Polyglot (12) π Academic Marathon (5)
π
Academic Marathon
(5)
π
Renaissance Researcher
(5)
πΊοΈ
Taxonomy Completionist
(51)
π
Keyword Champion
(7)
π€
Dynamic Duo
(10)
π
Grand Slam
π§¬
Topic Evolution
π¬
Deep Specialist
(17)
β
The Questioner
π₯
Unstoppable
(5)
ποΈ
Keyword Collector
(153)
β‘
Prolific Year
(10)
π
Trend Setter
π
Century Club
(39)
Conferences
ACL (8)
EMNLP (8)
AAAI (6)
EACL (5)
ICML (5)
ICLR (4)
COLING (3)
NAACL (3)
NIPS (3)
ACML (2)
CVPR (1)
IJCAI (1)
Top co-authors
Research topics
Keywords
large language model
(14)
sparse autoencoder
(7)
deep neural network
(4)
spurious correlation
(3)
steering vector
(3)
backdoor attack
(3)
neural network
(3)
model steering
(3)
out-of-distribution generalization
(3)
feature extraction
(3)
language model
(3)
pre-trained language model
(2)
benchmark evaluation
(2)
model explanation
(2)
demonstration selection
(2)
representation learning
(2)
latent representation
(2)
chain-of-thought reasoning
(2)
shortcut learning
(2)
in-context learning
(2)
Papers
KnowThyself: An Agentic Assistant for LLM Interpretability
AAAI 2026
SAGE: An Agentic Explainer Framework for Interpreting SAE Features in Language Models
EACL 2026
FaithLM: Towards Faithful Explanations for Large Language Models
EACL 2026
LLM Agents in Law: Taxonomy, Applications, and Challenges
ACL 2026
FinChart-Bench: Benchmarking Financial Chart Comprehension in Vision-Language Models
ACL 2026
FinCall-Surprise: A Large Scale Multi-modal Benchmark for Earning Surprise Prediction
ACL 2026
AdaJudge: Adaptive Multi-Perspective Judging for Reward Modeling
ACL 2026
Fine-Grained Interpretation of Political Opinions in Large Language Models
AAAI 2026
DeepSieve: Information Sieving via LLM-as-a-Knowledge-Router
EACL 2026
Denoising Concept Vectors with Sparse Autoencoders for Improved Language Model Steering
EACL 2026
Exploring Multilingual Probing in Large Language Models: A Cross-Language Analysis
ACL 2025
Improving LLM Reasoning through Interpretable Role-Playing Steering
EMNLP 2025
A Survey on Sparse Autoencoders: Interpreting the Internal Mechanisms of Large Language Models
EMNLP 2025
Language Ranker: A Metric for Quantifying LLM Performance Across High and Low-Resource Languages
AAAI 2025
Comparative Analysis of Demonstration Selection Algorithms for In-Context Learning in Large Language Models (Student Abstract)
AAAI 2025
Exploring Concept Depth: How Large Language Models Acquire Knowledge and Concept at Different Layers?
COLING 2025
Invisible Backdoor Attack against Self-supervised Learning
CVPR 2025
Large Vision-Language Model Alignment and Misalignment: A Survey Through the Lens of Explainability
EMNLP 2025
From Commands to Prompts: LLM-based Semantic File System for AIOS
ICLR 2025
Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution
ICLR 2025
ContractEval: Benchmarking LLMs for Clause-Level Legal Risk Identification in Commercial Contracts
EMNLP 2025
Beyond Input Activations: Identifying Influential Latents by Gradient Sparse Autoencoders
EMNLP 2025
SAE-SSV: Supervised Steering in Sparse Representation Spaces for Reliable Control of Language Models
EMNLP 2025
Massive Values in Self-Attention Modules are the Key to Contextual Knowledge Understanding
ICML 2025
Concept-Centric Token Interpretation for Vector-Quantized Generative Models
ICML 2025
Data-centric NLP Backdoor Defense from the Lens of Memorization
NAACL 2025
Feature Extraction and Steering for Enhanced Chain-of-Thought Reasoning in Language Models
EMNLP 2025
Strategic Demonstration Selection for Improved Fairness in LLM In-Context Learning
EMNLP 2024
The Impact of Reasoning Step Length on Large Language Models
ACL 2024
Data-Centric Explainable Debiasing for Improving Fairness in Pre-trained Language Models
ACL 2024
Knowledge Graph Large Language Model (KG-LLM) for Link Prediction
ACML 2024
DataFrame QA: A Universal LLM Framework on DataFrame Question Answering Without Data Exposure
ACML 2024
Mitigating Shortcuts in Language Models with Soft Label Encoding
COLING 2024
Unveiling Project-Specific Bias in Neural Code Models
COLING 2024
Explaining Time Series via Contrastive and Locally Sparse Perturbations
ICLR 2024
TVE: Learning Meta-attribution for Transferable Vision Explainer
ICML 2024
Secure Your Model: An Effective Key Prompt Protection Mechanism for Large Language Models
NAACL 2024
$\mathcal{M}^4$: A Unified XAI Benchmark for Faithfulness Evaluation of Feature Attribution Methods across Metrics, Modalities and Models
NIPS 2023
Fairness via Group Contribution Matching
IJCAI 2023
FAIRER: Fairness as Decision Rationale Alignment
ICML 2023
Robustness Challenges in Model Distillation and Pruning for Natural Language Understanding
EACL 2023
Prompt Tuning Pushes Farther, Contrastive Learning Pulls Closer: A Two-Stage Approach to Mitigate Social Biases
ACL 2023
Black-box Backdoor Defense via Zero-shot Image Purification
NIPS 2023
Towards Debiasing DNN Models from Spurious Feature Influence
AAAI 2022
Accelerating Shapley Explanation via Contributive Cooperator Selection
ICML 2022
DEGREE: Decomposition Based Explanation for Graph Neural Networks
ICLR 2022
A Unified Taylor Framework for Revisiting Attribution Methods
AAAI 2021
Towards Interpreting and Mitigating Shortcut Learning Behavior of NLU models
NAACL 2021
Fairness via Representation Neutralization
NIPS 2021