Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Interpretability
7318 directly classified papers
Papers per year
2003: 1
2006: 1
2007: 1
2008: 1
2009: 1
2010: 5
2012: 2
2013: 10
2014: 7
2015: 14
2016: 27
2017: 84
2018: 196
2019: 395
2020: 488
2021: 771
2022: 823
2023: 954
2024: 1360
2025: 1713
2026: 464
Papers
TaeBench: Improving Quality of Toxic Adversarial Examples
NAACL 2025
TactfulToM: Do LLMs have the Theory of Mind ability to understand White Lies?
EMNLP 2025
Trust Me, I’m Wrong: LLMs Hallucinate with Certainty Despite Knowing the Answer
EMNLP 2025
Interpretable Mnemonic Generation for Kanji Learning via Expectation-Maximization
EMNLP 2025
Can Multiple Responses from an LLM Reveal the Sources of Its Uncertainty?
EMNLP 2025
Sparse Activation Editing for Reliable Instruction Following in Narratives
EMNLP 2025
DiMo-GUI: Advancing Test-time Scaling in GUI Grounding via Modality-Aware Visual Reasoning
EMNLP 2025
CRITICTOOL: Evaluating Self-Critique Capabilities of Large Language Models in Tool-Calling Error Scenarios
EMNLP 2025
Large Language Models Badly Generalize across Option Length, Problem Types, and Irrelevant Noun Replacements
EMNLP 2025
Principled Personas: Defining and Measuring the Intended Effects of Persona Prompting on Task Performance
EMNLP 2025
Great Memory, Shallow Reasoning: Limits of kNN-LMs
NAACL 2025
Morables: A Benchmark for Assessing Abstract Moral Reasoning in LLMs with Fables
EMNLP 2025
Do RAG Systems Really Suffer From Positional Bias?
EMNLP 2025
Improving Large Language Model Safety with Contrastive Representation Learning
EMNLP 2025
Leveraging What’s Overfixed: Post-Correction via LLM Grammatical Error Overcorrection
EMNLP 2025
LinguaLens: Towards Interpreting Linguistic Mechanisms of Large Language Models via Sparse Auto-Encoder
EMNLP 2025
Investigating the interaction of linguistic and mathematical reasoning in language models using multilingual number puzzles
EMNLP 2025
Unsupervised Concept Vector Extraction for Bias Control in LLMs
EMNLP 2025
EGOILLUSION: Benchmarking Hallucinations in Egocentric Video Understanding
EMNLP 2025
Do All Autoregressive Transformers Remember Facts the Same Way? A Cross-Architecture Analysis of Recall Mechanisms
EMNLP 2025
Probing Narrative Morals: A New Character-Focused MFT Framework for Use with Large Language Models
EMNLP 2025
Probing and Boosting Large Language Models Capabilities via Attention Heads
EMNLP 2025
Explaining Differences Between Model Pairs in Natural Language through Sample Learning
EMNLP 2025
Toward Efficient Sparse Autoencoder-Guided Steering for Improved In-Context Learning in Large Language Models
EMNLP 2025
Decoding Uncertainty: The Impact of Decoding Strategies for Uncertainty Estimation in Large Language Models
EMNLP 2025
<
1
…
58
59
60
…
293
>