Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Interpretability
7318 directly classified papers
Papers per year
2003: 1
2006: 1
2007: 1
2008: 1
2009: 1
2010: 5
2012: 2
2013: 10
2014: 7
2015: 14
2016: 27
2017: 84
2018: 196
2019: 395
2020: 488
2021: 771
2022: 823
2023: 954
2024: 1360
2025: 1713
2026: 464
Papers
Explainability and Interpretability of Multilingual Large Language Models: A Survey
EMNLP 2025
DSVD: Dynamic Self-Verify Decoding for Faithful Generation in Large Language Models
EMNLP 2025
EIFBENCH: Extremely Complex Instruction Following Benchmark for Large Language Models
EMNLP 2025
Attention-guided Self-reflection for Zero-shot Hallucination Detection in Large Language Models
EMNLP 2025
Understanding and Mitigating Overrefusal in LLMs from an Unveiling Perspective of Safety Decision Boundary
EMNLP 2025
MMAG: Multimodal Learning for Mucus Anomaly Grading in Nasal Endoscopy via Semantic Attribute Prompting
EMNLP 2025
“I’ve Decided to Leak”: Probing Internals Behind Prompt Leakage Intents
EMNLP 2025
Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety
EMNLP 2025
Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon
EMNLP 2025
Unsupervised Hallucination Detection by Inspecting Reasoning Processes
EMNLP 2025
Follow the Flow: Fine-grained Flowchart Attribution with Neurosymbolic Agents
EMNLP 2025
Understanding Subword Compositionality of Large Language Models
EMNLP 2025
Internal Chain-of-Thought: Empirical Evidence for Layer‐wise Subtask Scheduling in LLMs
EMNLP 2025
Linguistic and Embedding-Based Profiling of Texts Generated by Humans and Large Language Models
EMNLP 2025
WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild?
EMNLP 2025
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth
EMNLP 2025
FLARE: Faithful Logic-Aided Reasoning and Exploration
EMNLP 2025
RAcQUEt: Unveiling the Dangers of Overlooked Referential Ambiguity in Visual LLMs
EMNLP 2025
What’s in a prompt? Language models encode literary style in prompt embeddings
EMNLP 2025
Identifying and Answering Questions with False Assumptions: An Interpretable Approach
EMNLP 2025
LLMs Don’t Know Their Own Decision Boundaries: The Unreliability of Self-Generated Counterfactual Explanations
EMNLP 2025
From Language to Cognition: How LLMs Outgrow the Human Language Network
EMNLP 2025
Improving Large Language Models Function Calling and Interpretability via Guided-Structured Templates
EMNLP 2025
AdaSteer: Your Aligned LLM is Inherently an Adaptive Jailbreak Defender
EMNLP 2025
LADDER: Language-Driven Slice Discovery and Error Rectification in Vision Classifiers
ACL 2025
<
1
…
57
58
59
…
293
>