conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Interpretability
7,318 papers
Papers per year
2003: 1
2006: 1
2007: 1
2008: 1
2009: 1
2010: 5
2012: 2
2013: 10
2014: 7
2015: 14
2016: 27
2017: 84
2018: 196
2019: 395
2020: 488
2021: 771
2022: 823
2023: 954
2024: 1360
2025: 1713
2026: 464
Papers
Stochastic Concept Bottleneck Models
NIPS 2024
ERBench: An Entity-Relationship based Automatically Verifiable Hallucination Benchmark for Large Language Models
NIPS 2024
Uncertainty-aware Fine-tuning of Segmentation Foundation Models
NIPS 2024
Denoising Diffusion Path: Attribution Noise Reduction with An Auxiliary Diffusion Model
NIPS 2024
Transformer Doctor: Diagnosing and Treating Vision Transformers
NIPS 2024
BrainBits: How Much of the Brain are Generative Reconstruction Methods Using?
NIPS 2024
What Rotary Position Embedding Can Tell Us: Identifying Query and Key Weights Corresponding to Basic Syntactic or High-level Semantic Information
NIPS 2024
Dissecting Query-Key Interaction in Vision Transformers
NIPS 2024
Selective Explanations
NIPS 2024
Do Counterfactually Fair Image Classifiers Satisfy Group Fairness? -- A Theoretical and Empirical Study
NIPS 2024
Spectral Editing of Activations for Large Language Model Alignment
NIPS 2024
Understanding Generalizability of Diffusion Models Requires Rethinking the Hidden Gaussian Structure
NIPS 2024
A Cat Is A Cat (Not A Dog!): Unraveling Information Mix-ups in Text-to-Image Encoders through Causal Analysis and Embedding Optimization
NIPS 2024
To Believe or Not to Believe Your LLM: Iterative Prompting for Estimating Epistemic Uncertainty
NIPS 2024
Conformalized Multiple Testing after Data-dependent Selection
NIPS 2024
CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses
NIPS 2024
Talking Heads: Understanding Inter-Layer Communication in Transformer Language Models
NIPS 2024
B-cosification: Transforming Deep Neural Networks to be Inherently Interpretable
NIPS 2024
T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models
NIPS 2024
Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs
NIPS 2024
Designs for Enabling Collaboration in Human-Machine Teaming via Interactive and Explainable Systems
NIPS 2024
Listenable Maps for Zero-Shot Audio Classifiers
NIPS 2024
Chain-of-Thought Reasoning Without Prompting
NIPS 2024
Training Data Attribution via Approximate Unrolling
NIPS 2024
Interpretable Concept Bottlenecks to Align Reinforcement Learning Agents
NIPS 2024
<
1
…
90
91
92
…
293
>