conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Interpretability
7,318 papers
Papers per year
2003: 1
2006: 1
2007: 1
2008: 1
2009: 1
2010: 5
2012: 2
2013: 10
2014: 7
2015: 14
2016: 27
2017: 84
2018: 196
2019: 395
2020: 488
2021: 771
2022: 823
2023: 954
2024: 1360
2025: 1713
2026: 464
Papers
Brittle Minds, Fixable Activations: Understanding Belief Representations in Language Models
EMNLP 2025
From Insight to Exploit: Leveraging LLM Collaboration for Adaptive Adversarial Text Generation
EMNLP 2025
Do We Know What LLMs Don’t Know? A Study of Consistency in Knowledge Probing
EMNLP 2025
Probing Political Ideology in Large Language Models: How Latent Political Representations Generalize Across Tasks
EMNLP 2025
Understanding GUI Agent Localization Biases through Logit Sharpness
EMNLP 2025
No Black Boxes: Interpretable and Interactable Predictive Healthcare with Knowledge-Enhanced Agentic Causal Discovery
EMNLP 2025
Beyond Linear Steering: Unified Multi-Attribute Control for Language Models
EMNLP 2025
SteerVLM: Robust Model Control through Lightweight Activation Steering for Vision Language Models
EMNLP 2025
PropXplain: Can LLMs Enable Explainable Propaganda Detection?
EMNLP 2025
Promptception: How Sensitive Are Large Multimodal Models to Prompts?
EMNLP 2025
Evaluating Compound AI Systems through Behaviors, Not Benchmarks
EMNLP 2025
Dissecting Persona-Driven Reasoning in Language Models via Activation Patching
EMNLP 2025
Do Before You Judge: Self-Reference as a Pathway to Better LLM Evaluation
EMNLP 2025
Mixed Signals: Decoding VLMs’ Reasoning and Underlying Bias in Vision-Language Conflict
EMNLP 2025
Mitigating Hallucination in Large Vision-Language Models through Aligning Attention Distribution to Information Flow
EMNLP 2025
Reliability Crisis of Reference-free Metrics for Grammatical Error Correction
EMNLP 2025
Rating Roulette: Self-Inconsistency in LLM-As-A-Judge Frameworks
EMNLP 2025
Quantifying Uncertainty in Natural Language Explanations of Large Language Models for Question Answering
EMNLP 2025
SACL: Understanding and Combating Textual Bias in Code Retrieval with Semantic-Augmented Reranking and Localization
EMNLP 2025
Who’s the Author? How Explanations Impact User Reliance in AI-Assisted Authorship Attribution
EMNLP 2025
How to Generalize the Detection of AI-Generated Text: Confounding Neurons
EMNLP 2025
Saudi-Alignment Benchmark: Assessing LLMs Alignment with Cultural Norms and Domain Knowledge in the Saudi Context
EMNLP 2025
AraHalluEval: A Fine-grained Hallucination Evaluation Framework for Arabic LLMs
EMNLP 2025
IslamicEval 2025: The First Shared Task of Capturing LLMs Hallucination in Islamic Content
EMNLP 2025
Two ways into the hall of mirrors: Language exposure and lossy memory drive cross-linguistic grammaticality illusions in language models
EMNLP 2025
<
1
…
68
69
70
…
293
>