Yonatan Belinkov
101 papers · 2013–2026 · 14 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+19 more ↓ Show less ↑
π Conference Polyglot (14) π Academic Marathon (12) π§ Keyword Pioneer π Interdisciplinary Bridge π£ Hot Topic Early Bird
π§
Keyword Pioneer
π
Cross-Pollinator
(7)
π
Academic Marathon
(12)
π
Keyword Trendsetter Combo
(5)
π
Conference Loyalist
(24)
π₯
Mega-Team
(61)
π
Grand Slam
π€
Dynamic Duo
(14)
π±
Topic Pioneer
π¬
Deep Specialist
(28)
π§¬
Topic Evolution
π
Keyword Champion
β
The Questioner
(4)
π
Trend Setter
ποΈ
Keyword Collector
(328)
π₯
Unstoppable
(11)
π
Century Club
(97)
β‘
Prolific Year
(7)
π
Conference Pioneer
Conferences
ACL (28)
EMNLP (17)
ICLR (16)
NAACL (13)
NIPS (7)
AAAI (6)
EACL (3)
IJCNLP (3)
INTERSPEECH (2)
SEMEVAL (2)
COLING (1)
ICCV (1)
ICML (1)
WACV (1)
Top co-authors
Research topics
Keywords
representation learning
(18)
language model
(13)
attention mechanism
(9)
neural machine translation
(9)
neural network
(9)
natural language inference
(8)
bias mitigation
(5)
model editing
(4)
neuron analysis
(4)
emergent communication
(4)
diffusion model
(4)
mechanistic interpretability
(4)
large language model
(4)
out-of-distribution generalization
(4)
model interpretability
(4)
causal mediation analysis
(4)
domain adaptation
(3)
transfer learning
(3)
domain generalization
(3)
attention head
(3)
Papers
Masked by Consensus: Disentangling Privileged Knowledge in LLM Correctness
ACL 2026
Follow the Flow: On Information Flow Across Textual Tokens in Text-to-Image Models
ACL 2026
CRISP: Persistent Concept Unlearning via Sparse Autoencoders
ACL 2026
Mechanisms of Prompt-Induced Hallucination in VisionβLanguage Models
ACL 2026
CtD: Composition through Decomposition in Emergent Communication
ICLR 2025
REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space
ACL 2025
Position-aware Automatic Circuit Discovery
ACL 2025
Back Attention: Understanding and Enhancing Multi-Hop Reasoning in Large Language Models
EMNLP 2025
SAEs Are Good for Steering β If You Select the Right Features
EMNLP 2025
Trust Me, Iβm Wrong: LLMs Hallucinate with Certainty Despite Knowing the Answer
EMNLP 2025
Unsupervised Translation of Emergent Communication
AAAI 2025
Measuring Chain of Thought Faithfulness by Unlearning Reasoning Steps
EMNLP 2025
DEPTH: Discourse Education through Pre-Training Hierarchically
NAACL 2025
Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models
NAACL 2025
Findings of the BlackboxNLP 2025 Shared Task: Localizing Circuits and Causal Variables in Language Models
EMNLP 2025
BlackboxNLP-2025 MIB Shared Task: Improving Circuit Faithfulness via Better Edge Selection
EMNLP 2025
Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions
ICLR 2025
Arithmetic Without Algorithms: Language Models Solve Math with a Bag of Heuristics
ICLR 2025
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
ICLR 2025
Jamba: Hybrid Transformer-Mamba Language Models
ICLR 2025
MIB: A Mechanistic Interpretability Benchmark
ICML 2025
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
ICLR 2025
Unified Concept Editing in Diffusion Models
WACV 2024
Linearity of Relation Decoding in Transformer Language Models
ICLR 2024
Semantics and Spatiality of Emergent Communication
NIPS 2024
Confidence Regulation Neurons in Language Models
NIPS 2024
Accelerating the Global Aggregation of Local Explanations
AAAI 2024
Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information
NAACL 2024
ContraSim β Analyzing Neural Representations Based on Contrastive Learning
NAACL 2024
ReFACT: Updating Text-to-Image Models by Editing the Text Encoder
NAACL 2024
Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines
ACL 2024
Concept-Best-Matching: Evaluating Compositionality In Emergent Communication
ACL 2024
Learning from Others: Similarity-based Regularization for Mitigating Dataset Bias.
ACL 2024
Generating Benchmarks for Factuality Evaluation of Language Models
EACL 2024
A Dataset for Metaphor Detection in Early Medieval Hebrew Poetry
EACL 2024
Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking
ICLR 2024
Backward Lens: Projecting Language Model Gradients into the Vocabulary Space
EMNLP 2024
Fast Forwarding Low-Rank Training
EMNLP 2024
VISIT: Visualizing and Interpreting the Semantic Information Flow of Transformers
EMNLP 2023
BLIND: Bias Removal With No Demographics
ACL 2023
Shielded Representations: Protecting Sensitive Attributes Through Iterative Gradient-Based Projection
ACL 2023
Multiple sequence alignment as a sequence-to-sequence learning problem
ICLR 2023
Mass-Editing Memory in a Transformer
ICLR 2023
Editing Implicit Assumptions in Text-to-Image Diffusion Models
ICCV 2023
Parallel Context Windows for Large Language Models
ACL 2023
What Are You Token About? Dense Retrieval as Distributions Over the Vocabulary
ACL 2023
Emergent Quantized Communication
AAAI 2023
A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis
EMNLP 2023
When Language Models Fall in Love: Animacy Processing in Transformer Language Models
EMNLP 2023
How Gender Debiasing Affects Internal Model Representations, and Why It Matters
NAACL 2022
Supervising Model Attention with Human Explanations for Robust Natural Language Inference
AAAI 2022
IDANI: Inference-time Domain Adaptation via Neuron-level Interventions
NAACL 2022
Choose Your Lenses: Flaws in Gender Bias Evaluation
NAACL 2022
A Generative Approach for Mitigating Structural Biases in Natural Language Inference
NAACL 2022
On the Pitfalls of Analyzing Individual Neurons in Language Models
ICLR 2022
Measures of Information Reflect Memorization Patterns
NIPS 2022
A Multilingual Perspective Towards the Evaluation of Attribution Methods in Natural Language Inference
EMNLP 2022
Locating and Editing Factual Associations in GPT
NIPS 2022
Learning from others' mistakes: Avoiding dataset biases without modeling them
ICLR 2021
IRMβwhen it works and when it doesn't: A test case of natural language inference
NIPS 2021
Causal Analysis of Syntactic Agreement Mechanisms in Neural Language Models
ACL 2021
Probing the Probing Paradigm: Does Probing Accuracy Entail Task Relevance?
EACL 2021
Debiasing Methods in Natural Language Understanding Make Bias More Accessible
EMNLP 2021
Variational Information Bottleneck for Effective Low-Resource Fine-Tuning
ICLR 2021
Causal Analysis of Syntactic Agreement Mechanisms in Neural Language Models
IJCNLP 2021
End-to-End Bias Mitigation by Modelling Biases in Corpora
ACL 2020
Investigating Gender Bias in Language Models Using Causal Mediation Analysis
NIPS 2020
Similarity Analysis of Contextual Word Representation Models
ACL 2020
Findings of the WMT 2020 Shared Task on Machine Translation Robustness
EMNLP 2020
Analyzing Redundancy in Pretrained Transformer Models
EMNLP 2020
Analyzing Individual Neurons in Pre-trained Language Models
EMNLP 2020
The Sensitivity of Language Models and Humans to Winograd Schema Perturbations
ACL 2020
A Constructive Prediction of the Generalization Error Across Scales
ICLR 2020
Probing Neural Dialog Models for Conversational Understanding
ACL 2020
Interpretability and Analysis in Neural NLP
ACL 2020
Linguistic Knowledge and Transferability of Contextual Representations
NAACL 2019
Identifying and Controlling Important Neurons in Neural Machine Translation
ICLR 2019
Analyzing Phonetic and Graphemic Representations in End-to-End Automatic Speech Recognition
INTERSPEECH 2019
Adversarial Regularization for Visual Question Answering: Strengths, Shortcomings, and Side Effects
NAACL 2019
One Size Does Not Fit All: Comparing NMT Representations of Different Granularities
NAACL 2019
Findings of the First Shared Task on Machine Translation Robustness
ACL 2019
Analyzing the Structure of Attention in a Transformer Language Model
ACL 2019
Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP
ACL 2019
Improving Neural Language Models by Segmenting, Attending, and Predicting the Future
ACL 2019
Donβt Take the Premise for Granted: Mitigating Artifacts in Natural Language Inference
ACL 2019
LSTM Networks Can Perform Dynamic Counting
ACL 2019
NeuroX: A Toolkit for Analyzing Individual Neurons in Neural Networks
AAAI 2019
What Is One Grain of Sand in the Desert? Analyzing Individual Neurons in Deep NLP Models
AAAI 2019
Synthetic and Natural Noise Both Break Neural Machine Translation
ICLR 2018
On the Evaluation of Semantic Phenomena in Neural Machine Translation Using Natural Language Inference
NAACL 2018
Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks
IJCNLP 2017
QMDIS: QCRI-MIT Advanced Dialect Identification System
INTERSPEECH 2017
What do Neural Machine Translation Models Learn about Morphology?
ACL 2017
Challenging Language-Dependent Segmentation for Arabic: An Application to Machine Translation and Part-of-Speech Tagging
ACL 2017
Understanding and Improving Morphological Learning in the Neural Machine Translation Decoder
IJCNLP 2017
Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems
NIPS 2017
Neural Attention for Learning to Rank Questions in Community Question Answering
COLING 2016
SLS at SemEval-2016 Task 3: Neural-based Approaches for Ranking in Community Question Answering
SEMEVAL 2016
VectorSLU: A Continuous Word Vector Approach to Answer Selection in Community Question Answering Systems
SEMEVAL 2015
Arabic Diacritization with Recurrent Neural Networks
EMNLP 2015
Translating Dialectal Arabic to English
ACL 2013