conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Interpretability
7,318 papers
Papers per year
2003: 1
2006: 1
2007: 1
2008: 1
2009: 1
2010: 5
2012: 2
2013: 10
2014: 7
2015: 14
2016: 27
2017: 84
2018: 196
2019: 395
2020: 488
2021: 771
2022: 823
2023: 954
2024: 1360
2025: 1713
2026: 464
Papers
Weak-eval-Strong: Evaluating and Eliciting Lateral Thinking of LLMs with Situation Puzzles
NIPS 2024
FFAM: Feature Factorization Activation Map for Explanation of 3D Detectors
NIPS 2024
Decomposing and Interpreting Image Representations via Text in ViTs Beyond CLIP
NIPS 2024
Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models
NIPS 2024
Improving Alignment and Robustness with Circuit Breakers
NIPS 2024
Model Reconstruction Using Counterfactual Explanations: A Perspective From Polytope Theory
NIPS 2024
Dissect Black Box: Interpreting for Rule-Based Explanations in Unsupervised Anomaly Detection
NIPS 2024
Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE)
NIPS 2024
Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space
NIPS 2024
Beyond Concept Bottleneck Models: How to Make Black Boxes Intervenable?
NIPS 2024
A provable control of sensitivity of neural networks through a direct parameterization of the overall bi-Lipschitzness
NIPS 2024
Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2)
NIPS 2024
Large Language Models Must Be Taught to Know What They Don’t Know
NIPS 2024
Neural Model Checking
NIPS 2024
HENASY: Learning to Assemble Scene-Entities for Interpretable Egocentric Video-Language Model
NIPS 2024
Finding NeMo: Localizing Neurons Responsible For Memorization in Diffusion Models
NIPS 2024
Perception of Knowledge Boundary for Large Language Models through Semi-open-ended Question Answering
NIPS 2024
Benchmarking Out-of-Distribution Generalization Capabilities of DNN-based Encoding Models for the Ventral Visual Cortex.
NIPS 2024
Rethinking the Evaluation of Out-of-Distribution Detection: A Sorites Paradox
NIPS 2024
Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics
NIPS 2024
The Intelligible and Effective Graph Neural Additive Network
NIPS 2024
FEEL-SNN: Robust Spiking Neural Networks with Frequency Encoding and Evolutionary Leak Factor
NIPS 2024
Abstracted Shapes as Tokens - A Generalizable and Interpretable Model for Time-series Classification
NIPS 2024
Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning
NIPS 2024
InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques
NIPS 2024
<
1
…
92
93
94
…
293
>