conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Interpretability
7,318 papers
Papers per year
2003: 1
2006: 1
2007: 1
2008: 1
2009: 1
2010: 5
2012: 2
2013: 10
2014: 7
2015: 14
2016: 27
2017: 84
2018: 196
2019: 395
2020: 488
2021: 771
2022: 823
2023: 954
2024: 1360
2025: 1713
2026: 464
Papers
CE-Bench: Towards a Reliable Contrastive Evaluation Benchmark of Interpretability of Sparse Autoencoders
EMNLP 2025
Evil twins are not that evil: Qualitative insights into machine-generated prompts
EMNLP 2025
Steering Prepositional Phrases in Language Models: A Case of with-headed Adjectival and Adverbial Complements in Gemma-2
EMNLP 2025
Not a nuisance but a useful heuristic: Outlier dimensions favor frequent tokens in language models
EMNLP 2025
Interpreting Language Models Through Concept Descriptions: A Survey
EMNLP 2025
When LRP Diverges from Leave-One-Out in Transformers
EMNLP 2025
Circuit-Tracer: A New Library for Finding Feature Circuits
EMNLP 2025
Mechanistic Fine-tuning for In-context Learning
EMNLP 2025
Understanding How CodeLLMs (Mis)Predict Types with Activation Steering
EMNLP 2025
The Unheard Alternative: Contrastive Explanations for Speech-to-Text Models
EMNLP 2025
Exploring Large Language Models’ World Perception: A Multi-Dimensional Evaluation through Data Distribution
EMNLP 2025
On the Representations of Entities in Auto-regressive Large Language Models
EMNLP 2025
Can Language Neuron Intervention Reduce Non-Target Language Output?
EMNLP 2025
Fine-Grained Manipulation of Arithmetic Neurons
EMNLP 2025
What Features in Prompts Jailbreak LLMs? Investigating the Mechanisms Behind Attacks
EMNLP 2025
BlackboxNLP-2025 MIB Shared Task: Improving Circuit Faithfulness via Better Edge Selection
EMNLP 2025
BlackboxNLP-2025 MIB Shared Task: IPE: Isolating Path Effects for Improving Latent Circuit Identification
EMNLP 2025
BlackboxNLP-2025 MIB Shared Task: Exploring Ensemble Strategies for Circuit Localization Methods
EMNLP 2025
Findings of the BlackboxNLP 2025 Shared Task: Localizing Circuits and Causal Variables in Language Models
EMNLP 2025
Unpacking Ambiguity: The Interaction of Polysemous Discourse Markers and Non-DM Signals
EMNLP 2025
Entity Tracking in Small Language Models: An Attention-Based Study of Parameter-Efficient Fine-Tuning
EMNLP 2025
TripleCheck: Transparent Post-Hoc Verification of Biomedical Claims in AI-Generated Answers
EMNLP 2025
Predictive Modeling of Human Developers’ Evaluative Judgment of Generated Code as a Decision Process
EMNLP 2025
From Regulation to Interaction: Expert Views on Aligning Explainable AI with the EU AI Act
EMNLP 2025
FIRMA: Bidirectional Formal-Informal Mathematical Language Alignment with Proof-Theoretic Grounding
EMNLP 2025
<
1
…
69
70
71
…
293
>