conftrace_

Artificial Intelligence › Core AI ›

Interpretability

7,318 papers

Papers per year

Papers

Stochastic Concept Bottleneck Models NIPS 2024

ERBench: An Entity-Relationship based Automatically Verifiable Hallucination Benchmark for Large Language Models NIPS 2024

Uncertainty-aware Fine-tuning of Segmentation Foundation Models NIPS 2024

Denoising Diffusion Path: Attribution Noise Reduction with An Auxiliary Diffusion Model NIPS 2024

Transformer Doctor: Diagnosing and Treating Vision Transformers NIPS 2024

BrainBits: How Much of the Brain are Generative Reconstruction Methods Using? NIPS 2024

What Rotary Position Embedding Can Tell Us: Identifying Query and Key Weights Corresponding to Basic Syntactic or High-level Semantic Information NIPS 2024

Dissecting Query-Key Interaction in Vision Transformers NIPS 2024

Selective Explanations NIPS 2024

Do Counterfactually Fair Image Classifiers Satisfy Group Fairness? -- A Theoretical and Empirical Study NIPS 2024

Spectral Editing of Activations for Large Language Model Alignment NIPS 2024

Understanding Generalizability of Diffusion Models Requires Rethinking the Hidden Gaussian Structure NIPS 2024

A Cat Is A Cat (Not A Dog!): Unraveling Information Mix-ups in Text-to-Image Encoders through Causal Analysis and Embedding Optimization NIPS 2024

To Believe or Not to Believe Your LLM: Iterative Prompting for Estimating Epistemic Uncertainty NIPS 2024

Conformalized Multiple Testing after Data-dependent Selection NIPS 2024

CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses NIPS 2024

Talking Heads: Understanding Inter-Layer Communication in Transformer Language Models NIPS 2024

B-cosification: Transforming Deep Neural Networks to be Inherently Interpretable NIPS 2024

T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models NIPS 2024

Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs NIPS 2024

Designs for Enabling Collaboration in Human-Machine Teaming via Interactive and Explainable Systems NIPS 2024

Listenable Maps for Zero-Shot Audio Classifiers NIPS 2024

Chain-of-Thought Reasoning Without Prompting NIPS 2024

Training Data Attribution via Approximate Unrolling NIPS 2024

Interpretable Concept Bottlenecks to Align Reinforcement Learning Agents NIPS 2024