Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Interpretability
7318 directly classified papers
Papers per year
2003: 1
2006: 1
2007: 1
2008: 1
2009: 1
2010: 5
2012: 2
2013: 10
2014: 7
2015: 14
2016: 27
2017: 84
2018: 196
2019: 395
2020: 488
2021: 771
2022: 823
2023: 954
2024: 1360
2025: 1713
2026: 464
Papers
On Localizing and Deleting Toxic Memories in Large Language Models
NAACL 2025
Classic4Children: Adapting Chinese Literary Classics for Children with Large Language Model
NAACL 2025
Attention-guided Self-reflection for Zero-shot Hallucination Detection in Large Language Models
EMNLP 2025
COGUMELO at SemEval-2025 Task 3: A Synthetic Approach to Detecting Hallucinations in Language Models based on Named Entity Recognition
ACL 2025
Analyzing (In)Abilities of SAEs via Formal Languages
NAACL 2025
Analyzing the Inner Workings of Transformers in Compositional Generalization
NAACL 2025
FiRC-NLP at SemEval-2025 Task 3: Exploring Prompting Approaches for Detecting Hallucinations in LLMs
ACL 2025
TaeBench: Improving Quality of Toxic Adversarial Examples
NAACL 2025
Linear Relational Decoding of Morphology in Language Models
NAACL 2025
Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models
NAACL 2025
EIFBENCH: Extremely Complex Instruction Following Benchmark for Large Language Models
EMNLP 2025
Read Your Own Mind: Reasoning Helps Surface Self-Confidence Signals in LLMs
EMNLP 2025
ReSURE: Regularizing Supervision Unreliability for Multi-turn Dialogue Fine-tuning
EMNLP 2025
What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Gaussian-Noise-free Text-Image Corruption and Evaluation
NAACL 2025
Explainability and Interpretability of Multilingual Large Language Models: A Survey
EMNLP 2025
Threading the Needle: Reweaving Chain-of-Thought Reasoning to Explain Human Label Variation
EMNLP 2025
Identifying Pre-training Data in LLMs: A Neuron Activation-Based Detection Framework
EMNLP 2025
Gradient-guided Attention Map Editing: Towards Efficient Contextual Hallucination Mitigation
NAACL 2025
How Do Large Vision-Language Models See Text in Image? Unveiling the Distinctive Role of OCR Heads
EMNLP 2025
Counterfactual Explanations for Continuous Action Reinforcement Learning
IJCAI 2025
Artificial Impressions: Evaluating Large Language Model Behavior Through the Lens of Trait Impressions
EMNLP 2025
Rule-Guided Reinforcement Learning Policy Evaluation and Improvement
IJCAI 2025
A Graph-Theoretical Framework for Analyzing the Behavior of Causal Language Models
EMNLP 2025
NSF-MAP: Neurosymbolic Multimodal Fusion for Robust and Interpretable Anomaly Prediction in Assembly Pipelines
IJCAI 2025
PatentScore: Multi-dimensional Evaluation of LLM-Generated Patent Claims
EMNLP 2025
<
1
…
52
53
54
…
293
>