conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Interpretability
7,318 papers
Papers per year
2003: 1
2006: 1
2007: 1
2008: 1
2009: 1
2010: 5
2012: 2
2013: 10
2014: 7
2015: 14
2016: 27
2017: 84
2018: 196
2019: 395
2020: 488
2021: 771
2022: 823
2023: 954
2024: 1360
2025: 1713
2026: 464
Papers
How to Make LLMs Forget: On Reversing In-Context Knowledge Edits
NAACL 2025
Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can’t Answer?
NAACL 2025
Concept-Reversed Winograd Schema Challenge: Evaluating and Improving Robust Reasoning in Large Language Models via Abstraction
NAACL 2025
Language Models Encode Numbers Using Digit Representations in Base 10
NAACL 2025
GameTox: A Comprehensive Dataset and Analysis for Enhanced Toxicity Detection in Online Gaming Communities
NAACL 2025
FaithBench: A Diverse Hallucination Benchmark for Summarization by Modern LLMs
NAACL 2025
Great Memory, Shallow Reasoning: Limits of kNN-LMs
NAACL 2025
Repetition Neurons: How Do Language Models Produce Repetitions?
NAACL 2025
Task-driven Layerwise Additive Activation Intervention
NAACL 2025
Black-Box Visual Prompt Engineering for Mitigating Object Hallucination in Large Vision Language Models
NAACL 2025
DART: An AIGT Detector using AMR of Rephrased Text
NAACL 2025
Capturing Human Cognitive Styles with Language: Towards an Experimental Evaluation Paradigm
NAACL 2025
TaeBench: Improving Quality of Toxic Adversarial Examples
NAACL 2025
SCORE: Systematic COnsistency and Robustness Evaluation for Large Language Models
NAACL 2025
Developing a Reliable, Fast, General-Purpose Hallucination Detection and Mitigation Service
NAACL 2025
Detecting Sexism in Tweets: A Sentiment Analysis and Graph Neural Network Approach
NAACL 2025
Med-CoDE: Medical Critique based Disagreement Evaluation Framework
NAACL 2025
Linear Relational Decoding of Morphology in Language Models
NAACL 2025
DateLogicQA: Benchmarking Temporal Biases in Large Language Models
NAACL 2025
Representing and Clustering Errors in Offensive Language Detection
NAACL 2025
Do Video Language Models really understand the video contexts?
NAACL 2025
Towards LLMs Robustness to Changes in Prompt Format Styles
NAACL 2025
A Sentence-Level Visualization of Attention in Large Language Models
NAACL 2025
FACTS&EVIDENCE: An Interactive Tool for Transparent Fine-Grained Factual Verification of Machine-Generated Text
NAACL 2025
HALLUCANA: Fixing LLM Hallucination with A Canary Lookahead
NAACL 2025
<
1
…
81
82
83
…
293
>