conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Interpretability
7,318 papers
Papers per year
2003: 1
2006: 1
2007: 1
2008: 1
2009: 1
2010: 5
2012: 2
2013: 10
2014: 7
2015: 14
2016: 27
2017: 84
2018: 196
2019: 395
2020: 488
2021: 771
2022: 823
2023: 954
2024: 1360
2025: 1713
2026: 464
Papers
Analyzing (In)Abilities of SAEs via Formal Languages
NAACL 2025
Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
NAACL 2025
Verify-in-the-Graph: Entity Disambiguation Enhancement for Complex Claim Verification with Interactive Graph Representation
NAACL 2025
Aggregation Artifacts in Subjective Tasks Collapse Large Language Models’ Posteriors
NAACL 2025
Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages
NAACL 2025
Towards Robust Knowledge Representations in Multilingual LLMs for Equivalence and Inheritance based Consistent Reasoning
NAACL 2025
Sneaking Syntax into Transformer Language Models with Tree Regularization
NAACL 2025
Analyzing the Inner Workings of Transformers in Compositional Generalization
NAACL 2025
CAMIEval: Enhancing NLG Evaluation through Multidimensional Comparative Instruction-Following Analysis
NAACL 2025
CAVE: Controllable Authorship Verification Explanations
NAACL 2025
Bridging the Gap between Expert and Language Models: Concept-guided Chess Commentary Generation and Evaluation
NAACL 2025
Beyond Logit Lens: Contextual Embeddings for Robust Hallucination Detection & Grounding in VLMs
NAACL 2025
MiCEval: Unveiling Multimodal Chain of Thought’s Quality via Image Description and Reasoning Steps
NAACL 2025
LLM-guided Plan and Retrieval: A Strategic Alignment for Interpretable User Satisfaction Estimation in Dialogue
NAACL 2025
Analyzing Memorization in Large Language Models through the Lens of Model Attribution
NAACL 2025
Main Predicate and Their Arguments as Explanation Signals For Intent Classification
NAACL 2025
The LLM Language Network: A Neuroscientific Approach for Identifying Causally Task-Relevant Units
NAACL 2025
A Systematic Examination of Preference Learning through the Lens of Instruction-Following
NAACL 2025
One fish, two fish, but not the whole sea: Alignment reduces language models’ conceptual diversity
NAACL 2025
The Stochastic Parrot on LLM’s Shoulder: A Summative Assessment of Physical Concept Understanding
NAACL 2025
What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Gaussian-Noise-free Text-Image Corruption and Evaluation
NAACL 2025
Are explicit belief representations necessary? A comparison between Large Language Models and Bayesian probabilistic models
NAACL 2025
Characterizing the Role of Similarity in the Property Inferences of Language Models
NAACL 2025
MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools
NAACL 2025
SCIURus: Shared Circuits for Interpretable Uncertainty Representations in Language Models
NAACL 2025
<
1
…
80
81
82
…
293
>