conftrace_

Artificial Intelligence › Core AI ›

Interpretability

7,318 papers

Papers per year

Papers

Large Vision-Language Model Alignment and Misalignment: A Survey Through the Lens of Explainability EMNLP 2025

Attention Consistency for LLMs Explanation EMNLP 2025

Evaluating Step-by-step Reasoning Traces: A Survey EMNLP 2025

A Structured Framework for Evaluating and Enhancing Interpretive Capabilities of Multimodal LLMs in Culturally Situated Tasks EMNLP 2025

Towards Achieving Concept Completeness for Textual Concept Bottleneck Models EMNLP 2025

Table-Text Alignment: Explaining Claim Verification Against Tables in Scientific Papers EMNLP 2025

When Format Changes Meaning: Investigating Semantic Inconsistency of Large Language Models EMNLP 2025

LMUNIT: Fine-grained Evaluation with Natural Language Unit Tests EMNLP 2025

Can We Steer Reasoning Direction by Thinking Intervention? EMNLP 2025

FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning EMNLP 2025

How Does Cognitive Bias Affect Large Language Models? A Case Study on the Anchoring Effect in Price Negotiation Simulations EMNLP 2025

Multi-level Diagnosis and Evaluation for Robust Tabular Feature Engineering with Large Language Models EMNLP 2025

Beyond the First Error: Process Reward Models for Reflective Mathematical Reasoning EMNLP 2025

Breaking the Reviewer: Assessing the Vulnerability of Large Language Models in Automated Peer Review Under Textual Adversarial Attacks EMNLP 2025

Are Knowledge and Reference in Multilingual Language Models Cross-Lingually Consistent? EMNLP 2025

X-Boundary: Establishing Exact Safety Boundary to Shield LLMs from Jailbreak Attacks without Compromising Usability EMNLP 2025

Tag&Tab: Pretraining Data Detection in Large Language Models Using Keyword-Based Membership Inference Attack EMNLP 2025

The “r” in “woman” stands for rights. Auditing LLMs in Uncovering Social Dynamics in Implicit Misogyny EMNLP 2025

LLM Jailbreak Detection for (Almost) Free! EMNLP 2025

MARS-Bench: A Multi-turn Athletic Real-world Scenario Benchmark for Dialogue Evaluation EMNLP 2025

Understanding Refusal in Language Models with Sparse Autoencoders EMNLP 2025

Where Did That Come From? Sentence-Level Error-Tolerant Attribution EMNLP 2025

Explaining Length Bias in LLM-Based Preference Evaluations EMNLP 2025

How Well Can Reasoning Models Identify and Recover from Unhelpful Thoughts? EMNLP 2025

From Token to Action: State Machine Reasoning to Mitigate Overthinking in Information Retrieval EMNLP 2025