Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Interpretability
7318 directly classified papers
Papers per year
2003: 1
2006: 1
2007: 1
2008: 1
2009: 1
2010: 5
2012: 2
2013: 10
2014: 7
2015: 14
2016: 27
2017: 84
2018: 196
2019: 395
2020: 488
2021: 771
2022: 823
2023: 954
2024: 1360
2025: 1713
2026: 464
Papers
Multi-Attribute Steering of Language Models via Targeted Intervention
ACL 2025
Unveiling Cultural Blind Spots: Analyzing the Limitations of mLLMs in Procedural Text Comprehension
ACL 2025
Efficient Counterexample-Guided Fairness Verification and Repair of Neural Networks Using Satisfiability Modulo Convex Programming
IJCAI 2025
How well do LLMs reason over tabular data, really?
ACL 2025
Vulnerability of LLMs to Vertically Aligned Text Manipulations
ACL 2025
Steering off Course: Reliability Challenges in Steering Language Models
ACL 2025
FiRC-NLP at SemEval-2025 Task 3: Exploring Prompting Approaches for Detecting Hallucinations in LLMs
ACL 2025
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?
ACL 2025
ICR Probe: Tracking Hidden State Dynamics for Reliable Hallucination Detection in LLMs
ACL 2025
Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach
CVPR 2025
COGUMELO at SemEval-2025 Task 3: A Synthetic Approach to Detecting Hallucinations in Language Models based on Named Entity Recognition
ACL 2025
SemEval-2025 Task 11: Bridging the Gap in Text-Based Emotion Detection
ACL 2025
Revealing the Deceptiveness of Knowledge Editing: A Mechanistic Analysis of Superficial Editing
ACL 2025
Hierarchical Attention Generates Better Proofs
ACL 2025
T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation
CVPR 2025
SCOP: Evaluating the Comprehension Process of Large Language Models from a Cognitive View
ACL 2025
Neuron-Level Sequential Editing for Large Language Models
ACL 2025
Interpretable Generative Models through Post-hoc Concept Bottlenecks
CVPR 2025
FIRE: Robust Detection of Diffusion-Generated Images via Frequency-Guided Reconstruction Error
CVPR 2025
CalibraEval: Calibrating Prediction Distribution to Mitigate Selection Bias in LLMs-as-Judges
ACL 2025
IRT-Router: Effective and Interpretable Multi-LLM Routing via Item Response Theory
ACL 2025
Learning from Sufficient Rationales: Analysing the Relationship Between Explanation Faithfulness and Token-level Regularisation Strategies
IJCNLP 2025
HiddenDetect: Detecting Jailbreak Attacks against Multimodal Large Language Models via Monitoring Hidden States
ACL 2025
Towards Objective Fine-tuning: How LLMs’ Prior Knowledge Causes Potential Poor Calibration?
ACL 2025
Inner Information Analysis Algorithm for Deep Neural Network based on Community
ICLR 2025
<
1
…
25
26
27
…
293
>