Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Interpretability
7318 directly classified papers
Papers per year
2003: 1
2006: 1
2007: 1
2008: 1
2009: 1
2010: 5
2012: 2
2013: 10
2014: 7
2015: 14
2016: 27
2017: 84
2018: 196
2019: 395
2020: 488
2021: 771
2022: 823
2023: 954
2024: 1360
2025: 1713
2026: 464
Papers
Reasoning to Attend: Try to Understand How <SEG> Token Works
CVPR 2025
Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers
ACL 2025
Sequential Conditional Transport on Probabilistic Graphs for Interpretable Counterfactual Fairness
AAAI 2025
MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification
ACL 2025
Connecting Concept Layers and Rationales to Enhance Language Model Interpretability
EMNLP 2025
bea-jh at BEA 2025 Shared Task: Evaluating AI-powered Tutors through Pedagogically-Informed Reasoning
ACL 2025
Does Your AI Agent Get You? A Personalizable Framework for Approximating Human Models from Argumentation-based Dialogue Traces
AAAI 2025
Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models
ACL 2025
Active Fourier Auditor for Estimating Distributional Properties of ML Models
AAAI 2025
Dynamic Head Selection for Neural Lexicalized Constituency Parsing
ACL 2025
Unlocking the Game: Estimating Games in Möbius Representation for Explanation and High-Order Interaction Detection
AAAI 2025
Decoding Knowledge Attribution in Mixture-of-Experts: A Framework of Basic-Refinement Collaboration and Efficiency Analysis
ACL 2025
Extracting PAC Decision Trees from Black Box Binary Classifiers: The Gender Bias Study Case on BERT-based Language Models
AAAI 2025
Mamba Knockout for Unraveling Factual Information Flow
ACL 2025
EXCGEC: A Benchmark for Edit-Wise Explainable Chinese Grammatical Error Correction
AAAI 2025
Language Models Grow Less Humanlike beyond Phase Transition
ACL 2025
Uncertainty-aware Knowledge Tracing
AAAI 2025
PRISM: A Framework for Producing Interpretable Political Bias Embeddings with Political-Aware Cross-Encoder
ACL 2025
Do Large Language Models Know When Not to Answer in Medical QA?
EMNLP 2025
IRIS: Interpretable Retrieval-Augmented Classification for Long Interspersed Document Sequences
ACL 2025
An XAI Social Media Platform for Teaching K-12 Students AI-Driven Profiling, Clustering, and Engagement-Based Recommending
AAAI 2025
Behavioural vs. Representational Systematicity in End-to-End Models: An Opinionated Survey
ACL 2025
Certain but not Probable? Differentiating Certainty from Probability in LLM Token Outputs for Probabilistic Scenarios
EMNLP 2025
Don’t Miss the Forest for the Trees: Attentional Vision Calibration for Large Vision Language Models
ACL 2025
Feature Extraction and Steering for Enhanced Chain-of-Thought Reasoning in Language Models
EMNLP 2025
<
1
…
37
38
39
…
293
>