conftrace_

Artificial Intelligence › Core AI ›

Interpretability

7,318 papers

Papers per year

Papers

Thesis Distillation: Investigating The Impact of Bias in NLP Models on Hate Speech Detection EMNLP 2023

Analyzing Pre-trained and Fine-tuned Language Models EMNLP 2023

Emergent Linear Representations in World Models of Self-Supervised Sequence Models EMNLP 2023

Explaining Data Patterns in Natural Language with Language Models EMNLP 2023

Disentangling the Linguistic Competence of Privacy-Preserving BERT EMNLP 2023

“Honey, Tell Me What’s Wrong”, Global Explanation of Textual Discriminative Models through Cooperative Generation EMNLP 2023

Self-Consistency of Large Language Models under Ambiguity EMNLP 2023

Character-Level Chinese Backpack Language Models EMNLP 2023

Investigating Semantic Subspaces of Transformer Sentence Embeddings through Linear Structural Probing EMNLP 2023

Enhancing Interpretability Using Human Similarity Judgements to Prune Word Embeddings EMNLP 2023

When Your Language Model Cannot Even Do Determiners Right: Probing for Anti-Presuppositions and the Maximize Presupposition! Principle EMNLP 2023

Introducing VULCAN: A Visualization Tool for Understanding Our Models and Data by Example EMNLP 2023

Investigating the Effect of Discourse Connectives on Transformer Surprisal: Language Models Understand Connectives, Even So They Are Surprised EMNLP 2023

Investigating the Encoding of Words in BERT’s Neurons Using Feature Textualization EMNLP 2023

Rigorously Assessing Natural Language Explanations of Neurons EMNLP 2023

Identifying and Adapting Transformer-Components Responsible for Gender Bias in an English Language Model EMNLP 2023

Humans and language models diverge when predicting repeating text EMNLP 2023

A Comparative Study on Textual Saliency of Styles from Eye Tracking, Annotations, and Language Models EMNLP 2023

Revising with a Backward Glance: Regressions and Skips during Reading as Cognitive Signals for Revision Policies in Incremental Processing EMNLP 2023

Future Lens: Anticipating Subsequent Tokens from a Single Hidden State EMNLP 2023

Implications of Annotation Artifacts in Edge Probing Test Datasets EMNLP 2023

REFER: An End-to-end Rationale Extraction Framework for Explanation Regularization EMNLP 2023

Flesch or Fumble? Evaluating Readability Standard Alignment of Instruction-Tuned Language Models EMNLP 2023

Are Large Language Models Reliable Judges? A Study on the Factuality Evaluation Capabilities of LLMs EMNLP 2023

Evaluating Neural Language Models as Cognitive Models of Language Acquisition EMNLP 2023