conftrace_

Artificial Intelligence › Core AI ›

Interpretability

7,318 papers

Papers per year

Papers

Weak-eval-Strong: Evaluating and Eliciting Lateral Thinking of LLMs with Situation Puzzles NIPS 2024

FFAM: Feature Factorization Activation Map for Explanation of 3D Detectors NIPS 2024

Decomposing and Interpreting Image Representations via Text in ViTs Beyond CLIP NIPS 2024

Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models NIPS 2024

Improving Alignment and Robustness with Circuit Breakers NIPS 2024

Model Reconstruction Using Counterfactual Explanations: A Perspective From Polytope Theory NIPS 2024

Dissect Black Box: Interpreting for Rule-Based Explanations in Unsupervised Anomaly Detection NIPS 2024

Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE) NIPS 2024

Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space NIPS 2024

Beyond Concept Bottleneck Models: How to Make Black Boxes Intervenable? NIPS 2024

A provable control of sensitivity of neural networks through a direct parameterization of the overall bi-Lipschitzness NIPS 2024

Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2) NIPS 2024

Large Language Models Must Be Taught to Know What They Don’t Know NIPS 2024

Neural Model Checking NIPS 2024

HENASY: Learning to Assemble Scene-Entities for Interpretable Egocentric Video-Language Model NIPS 2024

Finding NeMo: Localizing Neurons Responsible For Memorization in Diffusion Models NIPS 2024

Perception of Knowledge Boundary for Large Language Models through Semi-open-ended Question Answering NIPS 2024

Benchmarking Out-of-Distribution Generalization Capabilities of DNN-based Encoding Models for the Ventral Visual Cortex. NIPS 2024

Rethinking the Evaluation of Out-of-Distribution Detection: A Sorites Paradox NIPS 2024

Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics NIPS 2024

The Intelligible and Effective Graph Neural Additive Network NIPS 2024

FEEL-SNN: Robust Spiking Neural Networks with Frequency Encoding and Evolutionary Leak Factor NIPS 2024

Abstracted Shapes as Tokens - A Generalizable and Interpretable Model for Time-series Classification NIPS 2024

Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning NIPS 2024

InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques NIPS 2024