conftrace_

Artificial Intelligence › Core AI ›

Interpretability

7,318 papers

Papers per year

Papers

A Frustratingly Easy Plug-and-Play Detection-and-Reasoning Module for Chinese Spelling Check EMNLP 2023

Beyond Labels: Empowering Human Annotators with Natural Language Explanations through a Novel Active-Learning Architecture EMNLP 2023

Error Detection for Text-to-SQL Semantic Parsing EMNLP 2023

Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate EMNLP 2023

IAEval: A Comprehensive Evaluation of Instance Attribution on Natural Language Understanding EMNLP 2023

What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts and Rationales for Disambiguating Defeasible Social and Moral Situations EMNLP 2023

DetectLLM: Leveraging Log Rank Information for Zero-Shot Detection of Machine-Generated Text EMNLP 2023

Adversarial Robustness for Large Language NER models using Disentanglement and Word Attributions EMNLP 2023

Probing Representations for Document-level Event Extraction EMNLP 2023

Language-Agnostic Bias Detection in Language Models with Bias Probing EMNLP 2023

NLMs: Augmenting Negation in Language Models EMNLP 2023

Subspace Chronicles: How Linguistic Information Emerges, Shifts and Interacts during Language Model Training EMNLP 2023

Ethical Reasoning over Moral Alignment: A Case and Framework for In-Context Ethical Policies in LLMs EMNLP 2023

Rethinking the Construction of Effective Metrics for Understanding the Mechanisms of Pretrained Language Models EMNLP 2023

Are Structural Concepts Universal in Transformer Language Models? Towards Interpretable Cross-Lingual Generalization EMNLP 2023

VISIT: Visualizing and Interpreting the Semantic Information Flow of Transformers EMNLP 2023

Is the Answer in the Text? Challenging ChatGPT with Evidence Retrieval from Instructive Text EMNLP 2023

Unnatural language processing: How do language models handle machine-generated prompts? EMNLP 2023

A Confederacy of Models: a Comprehensive Evaluation of LLMs on Creative Writing EMNLP 2023

Emptying the Ocean with a Spoon: Should We Edit Models? EMNLP 2023

Learning Interpretable Style Embeddings via Prompting LLMs EMNLP 2023

SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency EMNLP 2023

Detecting Argumentative Fallacies in the Wild: Problems and Limitations of Large Language Models EMNLP 2023

Constituency Tree Representation for Argument Unit Recognition EMNLP 2023

Unsupervised argument reframing with a counterfactual-based approach EMNLP 2023