Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Interpretability
7318 directly classified papers
Papers per year
2003: 1
2006: 1
2007: 1
2008: 1
2009: 1
2010: 5
2012: 2
2013: 10
2014: 7
2015: 14
2016: 27
2017: 84
2018: 196
2019: 395
2020: 488
2021: 771
2022: 823
2023: 954
2024: 1360
2025: 1713
2026: 464
Papers
Attributive Reasoning for Hallucination Diagnosis of Large Language Models
AAAI 2025
SELF-[IN]CORRECT: LLMs Struggle with Discriminating Self-Generated Responses
AAAI 2025
Tuning-Free Accountable Intervention for LLM Deployment – a Metacognitive Approach
AAAI 2025
Is Sarcasm Detection a Step-by-Step Reasoning Process in Large Language Models?
AAAI 2025
Cooperative or Competitive? Understanding the Interaction between Attention Heads From A Game Theory Perspective
ACL 2025
MEDDxAgent: A Unified Modular Agent Framework for Explainable Automatic Differential Diagnosis
ACL 2025
Benchmarking and Understanding Compositional Relational Reasoning of LLMs
AAAI 2025
Activation Steering Decoding: Mitigating Hallucination in Large Vision-Language Models through Bidirectional Hidden State Intervention
ACL 2025
Beyond Surface Simplicity: Revealing Hidden Reasoning Attributes for Precise Commonsense Diagnosis
ACL 2025
Calibrating Large Language Models with Sample Consistency
AAAI 2025
Beyond Accuracy: On the Effects of Fine-Tuning Towards Vision-Language Model’s Prediction Rationality
AAAI 2025
Knowledge-Augmented Multimodal Clinical Rationale Generation for Disease Diagnosis with Small Language Models
ACL 2025
The Knowledge Microscope: Features as Better Analytical Lenses than Neurons
ACL 2025
FLUE: Streamlined Uncertainty Estimation for Large Language Models
AAAI 2025
Cracking Factual Knowledge: A Comprehensive Analysis of Degenerate Knowledge Neurons in Large Language Models
ACL 2025
CADReview: Automatically Reviewing CAD Programs with Error Detection and Correction
ACL 2025
Towards Unifying Evaluation of Counterfactual Explanations: Leveraging Large Language Models for Human-Centric Assessments
AAAI 2025
Extracting Interpretable Task-Specific Circuits from Large Language Models for Faster Inference
AAAI 2025
Imitate Before Detect: Aligning Machine Stylistic Preference for Machine-Revised Text Detection
AAAI 2025
Improving Preference Extraction In LLMs By Identifying Latent Knowledge Through Classifying Probes
ACL 2025
Comparing LLM-generated and human-authored news text using formal syntactic theory
ACL 2025
Quality-Informed Segment-Level Error Correction Using Natural Language Explanations from xTower and Large Language Models
EMNLP 2025
Circuit Stability Characterizes Language Model Generalization
ACL 2025
Are LLMs effective psychological assessors? Leveraging adaptive RAG for interpretable mental health screening through psychometric practice
ACL 2025
Targeted Source Text Editing for Machine Translation: Exploiting Quality Estimators and Large Language Models
EMNLP 2025
<
1
…
26
27
28
…
293
>