conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Interpretability
7,318 papers
Papers per year
2003: 1
2006: 1
2007: 1
2008: 1
2009: 1
2010: 5
2012: 2
2013: 10
2014: 7
2015: 14
2016: 27
2017: 84
2018: 196
2019: 395
2020: 488
2021: 771
2022: 823
2023: 954
2024: 1360
2025: 1713
2026: 464
Papers
Linguistically Grounded Analysis of Language Models using Shapley Head Values
NAACL 2025
Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model Reasoning
NAACL 2025
Intrinsic Model Weaknesses: How Priming Attacks Unveil Vulnerabilities in Large Language Models
NAACL 2025
From Argumentation to Deliberation: Perspectivized Stance Vectors for Fine-grained (Dis)agreement Analysis
NAACL 2025
Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models
NAACL 2025
Semantic Consistency-Based Uncertainty Quantification for Factuality in Radiology Report Generation
NAACL 2025
Evaluating Vision-Language Models for Emotion Recognition
NAACL 2025
Open Domain Question Answering with Conflicting Contexts
NAACL 2025
SAFR: Neuron Redistribution for Interpretability
NAACL 2025
Attention Tracker: Detecting Prompt Injection Attacks in LLMs
NAACL 2025
Neuro-symbolic Training for Reasoning over Spatial Language
NAACL 2025
On Localizing and Deleting Toxic Memories in Large Language Models
NAACL 2025
Classic4Children: Adapting Chinese Literary Classics for Children with Large Language Model
NAACL 2025
Chain-of-Probe: Examining the Necessity and Accuracy of CoT Step-by-Step
NAACL 2025
Neuroplasticity and Corruption in Model Mechanisms: A Case Study Of Indirect Object Identification
NAACL 2025
A Practical Method for Generating String Counterfactuals
NAACL 2025
Improve Decoding Factuality by Token-wise Cross Layer Entropy of Large Language Models
NAACL 2025
Comprehensive Layer-wise Analysis of SSL Models for Audio Deepfake Detection
NAACL 2025
Attention on Multiword Expressions: A Multilingual Study of BERT-based Models with Regard to Idiomaticity and Microsyntax
NAACL 2025
Efficient Nearest Neighbor based Uncertainty Estimation for Natural Language Processing Tasks
NAACL 2025
From Text to Emoji: How PEFT-Driven Personality Manipulation Unleashes the Emoji Potential in LLMs
NAACL 2025
Induction Heads as an Essential Mechanism for Pattern Matching in In-context Learning
NAACL 2025
On the Feasibility of In-Context Probing for Data Attribution
NAACL 2025
Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs
NAACL 2025
Weight-based Analysis of Detokenization in Language Models: Understanding the First Stage of Inference Without Inference
NAACL 2025
<
1
…
82
83
84
…
293
>