Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Interpretability
7318 directly classified papers
Papers per year
2003: 1
2006: 1
2007: 1
2008: 1
2009: 1
2010: 5
2012: 2
2013: 10
2014: 7
2015: 14
2016: 27
2017: 84
2018: 196
2019: 395
2020: 488
2021: 771
2022: 823
2023: 954
2024: 1360
2025: 1713
2026: 464
Papers
Revealing and Mitigating the Challenge of Detecting Character Knowledge Errors in LLM Role-Playing
EMNLP 2025
Mechanisms vs. Outcomes: Probing for Syntax Fails to Explain Performance on Targeted Syntactic Evaluations
EMNLP 2025
Data Descriptions from Large Language Models with Influence Estimation
EMNLP 2025
Toward Machine Translation Literacy: How Lay Users Perceive and Rely on Imperfect Translations
EMNLP 2025
LiTEx: A Linguistic Taxonomy of Explanations for Understanding Within-Label Variation in Natural Language Inference
EMNLP 2025
From Input Perception to Predictive Insight: Modeling Model Blind Spots Before They Become Errors
EMNLP 2025
AI Argues Differently: Distinct Argumentative and Linguistic Patterns of LLMs in Persuasive Contexts
EMNLP 2025
The Illusion of Progress: Re-evaluating Hallucination Detection in LLMs
EMNLP 2025
Turning Logic Against Itself: Probing Model Defenses Through Contrastive Questions
EMNLP 2025
NormXLogit: The Head-on-Top Never Lies
EMNLP 2025
FoREST: Frame of Reference Evaluation in Spatial Reasoning Tasks
EMNLP 2025
A Simple Yet Effective Method for Non-Refusing Context Relevant Fine-grained Safety Steering in LLMs
EMNLP 2025
Quantifying Logical Consistency in Transformers via Query-Key Alignment
EMNLP 2025
CourtReasoner: Can LLM Agents Reason Like Judges?
EMNLP 2025
Mind the Blind Spots: A Focus-Level Evaluation Framework for LLM Reviews
EMNLP 2025
Unconditional Truthfulness: Learning Unconditional Uncertainty of Large Language Models
EMNLP 2025
LingConv: An Interactive Toolkit for Controlled Paraphrase Generation with Linguistic Attribute Control
EMNLP 2025
AgentDiagnose: An Open Toolkit for Diagnosing LLM Agent Trajectories
EMNLP 2025
CafGa: Customizing Feature Attributions to Explain Language Models
EMNLP 2025
EasyEdit2: An Easy-to-use Steering Framework for Editing Large Language Models
EMNLP 2025
AERA Chat: An Interactive Platform for Automated Explainable Student Answer Assessment
EMNLP 2025
o-MEGA: Optimized Methods for Explanation Generation and Analysis
EMNLP 2025
TRACE: Training and Inference-Time Interpretability Analysis for Language Models
EMNLP 2025
From Behavioral Performance to Internal Competence: Interpreting Vision-Language Models with VLM-Lens
EMNLP 2025
Hybrid Concept Bottleneck Models
CVPR 2025
<
1
…
60
61
62
…
293
>