Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Interpretability
7318 directly classified papers
Papers per year
2003: 1
2006: 1
2007: 1
2008: 1
2009: 1
2010: 5
2012: 2
2013: 10
2014: 7
2015: 14
2016: 27
2017: 84
2018: 196
2019: 395
2020: 488
2021: 771
2022: 823
2023: 954
2024: 1360
2025: 1713
2026: 464
Papers
Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models
NAACL 2025
ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models
EMNLP 2025
SafeQuant: LLM Safety Analysis via Quantized Gradient Inspection
NAACL 2025
Cross-Document Cross-Lingual NLI via RST-Enhanced Graph Fusion and Interpretability Prediction
EMNLP 2025
Who’s the Author? How Explanations Impact User Reliance in AI-Assisted Authorship Attribution
EMNLP 2025
Neural Reasoning Networks: Efficient Interpretable Neural Networks with Automatic Textual Explanations
AAAI 2025
Things Machine Learning Models Know That They Don’t Know
AAAI 2025
Interpretable Sparse Features for Probing Self-Supervised Speech Models
IJCNLP 2025
Judging the Judges: A Systematic Study of Position Bias in LLM-as-a-Judge
IJCNLP 2025
Conceptual Learning via Embedding Approximations for Reinforcing Interpretability and Transparency
WACV 2025
MiCEval: Unveiling Multimodal Chain of Thought’s Quality via Image Description and Reasoning Steps
NAACL 2025
Decoding Emergent Big Five Traits in Large Language Models: Temperature-Dependent Expression and Architectural Clustering
IJCNLP 2025
WISE: Weak-Supervision-Guided Step-by-Step Explanations for Multimodal LLMs in Image Classification
EMNLP 2025
IndicSentEval: How Effectively do Multilingual Transformer Models encode Linguistic Properties for Indic Languages?
IJCNLP 2025
CLEV: LLM-Based Evaluation Through Lightweight Efficient Voting for Free-Form Question-Answering
IJCNLP 2025
Calibration Across Layers: Understanding Calibration Evolution in LLMs
EMNLP 2025
To Generate or Discriminate? Methodological Considerations for Measuring Cultural Alignment in LLMs
IJCNLP 2025
Read Between the Lines: A Benchmark for Uncovering Political Bias in Bangla News Articles
IJCNLP 2025
The discordance between embedded ethics and cultural inference in large language models
EMNLP 2025
SHARP: Steering Hallucination in LVLMs via Representation Engineering
EMNLP 2025
Surprisal Dynamics for the Detection of Multi-Word Expressions in English
IJCNLP 2025
Unveiling the Influence of Amplifying Language-Specific Neurons
IJCNLP 2025
Multi-Domain Explainability of Preferences
EMNLP 2025
Learning from Hallucinations: Mitigating Hallucinations in LLMs via Internal Representation Intervention
IJCNLP 2025
Modular Arithmetic: Language Models Solve Math Digit by Digit
IJCNLP 2025
<
1
…
48
49
50
…
293
>