Dirk Hovy
121 papers · 2009–2026 · 9 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+16 more ↓ Show less ↑
🌍 Conference Polyglot (9) 🐣 Hot Topic Early Bird 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🏃 Academic Marathon (16)
🧭
Keyword Pioneer
🌉
Interdisciplinary Bridge
🏃
Academic Marathon
(16)
🏠
Conference Loyalist
(43)
🐺
Lone Wolf
(5)
🤝
Dynamic Duo
(20)
🧬
Topic Evolution
👥
Mega-Team
(42)
🏆
Keyword Champion
(4)
🔬
Deep Specialist
(30)
🗃️
Keyword Collector
(352)
❓
The Questioner
(13)
⚡
Prolific Year
(9)
💎
Century Club
(115)
🔥
Unstoppable
(17)
📈
Trend Setter
Conferences
ACL (44)
EMNLP (23)
EACL (16)
NAACL (16)
COLING (9)
IJCNLP (7)
SEMEVAL (3)
CONLL (2)
AAAI (1)
Top co-authors
Research topics
Keywords
large language model
(17)
text classification
(17)
natural language processing
(13)
hate speech detection
(9)
language model
(7)
gender bia
(7)
sentiment analysis
(7)
zero-shot learning
(5)
multi-task learning
(4)
responsible ai
(4)
model evaluation
(4)
social media analysis
(4)
multilingual nlp
(4)
inter-annotator agreement
(4)
representation learning
(3)
multilingual model
(3)
emotion analysis
(3)
attention mechanism
(3)
document embedding
(3)
prompt engineering
(3)
Papers
Do Large Language Models Adapt to Language Variation across Socioeconomic Status?
EACL 2026
PATS: Personality-Aware Teaching Strategies with Large Language Model Tutors
EACL 2026
Can Reasoning Help Large Language Models Capture Human Annotator Disagreement?
EACL 2026
The Pluralistic Moral Gap: Understanding Moral Judgment and Value Differences between Humans and Large Language Models
EACL 2026
Exploring Subjective Tasks in Farsi: A Survey Analysis and Evaluation of Language Model
EACL 2026
Responsible Evaluation of AI for Mental Health
ACL 2026
The AI Gap: How Socioeconomic Status Affects Language Technology Interactions
ACL 2025
Beyond Demographics: Fine-tuning Large Language Models to Predict Individuals’ Subjective Text Perceptions
ACL 2025
Consistency is Key: Disentangling Label Variation in Natural Language Processing with Intra-Annotator Agreement
EMNLP 2025
Social Intelligence in the Age of LLMs
NAACL 2025
Co-DETECT: Collaborative Discovery of Edge Cases in Text Classification
EMNLP 2025
SafetyPrompts: A Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
AAAI 2025
No for Some, Yes for Others: Persona Prompts and Other Sources of False Refusal in Language Models
EMNLP 2025
Biased Tales: Cultural and Topic Bias in Generating Children’s Stories
EMNLP 2025
Principled Personas: Defining and Measuring the Intended Effects of Persona Prompting on Task Performance
EMNLP 2025
Educators’ Perceptions of Large Language Models as Tutors: Comparing Human and AI Tutors in a Blind Text-only Setting
ACL 2025
Beyond Flesch-Kincaid: Prompt-based Metrics Improve Difficulty Classification of Educational Texts
NAACL 2024
Angry Men, Sad Women: Large Language Models Reflect Gendered Stereotypes in Emotion Attribution
ACL 2024
Classist Tools: Social Class Correlates with Performance in NLP
ACL 2024
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models
ACL 2024
Narratives at Conflict: Computational Analysis of News Framing in Multilingual Disinformation Campaigns
ACL 2024
Compromesso! Italian Many-Shot Jailbreaks undermine the safety of Large Language Models
ACL 2024
“My Answer is C”: First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models
ACL 2024
Comparing Pre-trained Human Language Models: Is it Better with Human Context as Groups, Individual Traits, or Both?
ACL 2024
Explaining Speech Classification Models via Word-Level Audio Segments and Paralinguistic Features
EACL 2024
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models
NAACL 2024
Emotion Analysis in NLP: Trends, Gaps and Roadmap for Future Directions
COLING 2024
DADIT: A Dataset for Demographic Classification of Italian Twitter Users and a Comparison of Prediction Methods
COLING 2024
Twists, Humps, and Pebbles: Multilingual Speech Recognition Models Exhibit Gender Performance Gaps
EMNLP 2024
Divine LLaMAs: Bias, Stereotypes, Stigmatization, and Emotion Representation of Religion in Large Language Models
EMNLP 2024
Wisdom of Instruction-Tuned Language Model Crowds. Exploring Model Label Variation
COLING 2024
Impoverished Language Technology: The Lack of (Social) Class in NLP
COLING 2024
Can Demographic Factors Improve Text Classification? Revisiting Demographic Adaptation in the Age of Transformers
EACL 2023
MilaNLP at SemEval-2023 Task 10: Ensembling Domain-Adapted and Regularized Pretrained Language Models for Robust Sexism Detection
SEMEVAL 2023
Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
EACL 2023
What about “em”? How Commercial Machine Translation Fails to Handle (Neo-)Pronouns
ACL 2023
The Ecological Fallacy in Annotation: Modeling Human Label Variation goes beyond Sociodemographics
ACL 2023
The State of Profanity Obfuscation in Natural Language Processing Scientific Publications
ACL 2023
Temporal and Second Language Influence on Intra-Annotator Agreement and Stability in Hate Speech Labelling
ACL 2023
MilaNLP at SemEval-2023 Task 10: Ensembling Domain-Adapted and Regularized Pretrained Language Models for Robust Sexism Detection
ACL 2023
Respectful or Toxic? Using Zero-Shot Learning with Language Models to Detect Hate Speech
ACL 2023
XLM-EMO: Multilingual Emotion Prediction in Social Media Text
ACL 2022
Benchmarking Post-Hoc Interpretability Approaches for Transformer-based Misogyny Detection
ACL 2022
Language Invariant Properties in Natural Language Processing
ACL 2022
Measuring Harmful Sentence Completion in Language Models for LGBTQIA+ Individuals
ACL 2022
Pipelines for Social Bias Testing of Large Language Models
ACL 2022
Entropy-based Attention Regularization Frees Unintended Bias Mitigation from Lists
ACL 2022
Hard and Soft Evaluation of NLP models with BOOtSTrap SAmpling - BooStSa
ACL 2022
Welcome to the Modern World of Pronouns: Identity-Inclusive Natural Language Processing beyond Gender
COLING 2022
SafetyKit: First Aid for Measuring Safety in Open-domain Conversational Systems
ACL 2022
Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks
NAACL 2022
Bridging Fairness and Environmental Sustainability in Natural Language Processing
EMNLP 2022
Data-Efficient Strategies for Expanding Hate Speech Detection into Under-Resourced Languages
EMNLP 2022
Twitter-Demographer: A Flow-based Tool to Enrich Twitter Data
EMNLP 2022
“It’s Not Just Hate”: A Multi-Dimensional Perspective on Detecting Harmful Speech Online
EMNLP 2022
SocioProbe: What, When, and Where Language Models Learn about Sociodemographics
EMNLP 2022
The Importance of Modeling Social Factors of Language: Theory and Practice
NAACL 2021
Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence
ACL 2021
“We will Reduce Taxes” - Identifying Election Pledges with Language Models
ACL 2021
On the Gap between Adoption and Understanding in NLP
ACL 2021
We Need to Consider Disagreement in Evaluation
ACL 2021
Cross-lingual Contextualized Topic Models with Zero-shot Learning
EACL 2021
BERTective: Language Models and Contextual Information for Deception Detection
EACL 2021
Universal Joy A Data Set and Results for Classifying Emotions Across Languages
EACL 2021
FEEL-IT: Emotion and Sentiment Classification for the Italian Language
EACL 2021
MilaNLP @ WASSA: Does BERT Feel Sad When You Cry?
EACL 2021
Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence
IJCNLP 2021
“We will Reduce Taxes” - Identifying Election Pledges with Language Models
IJCNLP 2021
On the Gap between Adoption and Understanding in NLP
IJCNLP 2021
We Need to Consider Disagreement in Evaluation
IJCNLP 2021
HONEST: Measuring Hurtful Sentence Completion in Language Models
NAACL 2021
Beyond Black & White: Leveraging Annotator Disagreement via Soft-Label Multi-Task Learning
NAACL 2021
A Report on the VarDial Evaluation Campaign 2020
COLING 2020
Helpful or Hierarchical? Predicting the Communicative Strategies of Chat Participants, and their Impact on Success
EMNLP 2020
“You Sound Just Like Your Father” Commercial Machine Translation Systems Include Stylistic Biases
ACL 2020
Predictive Biases in Natural Language Processing Models: A Conceptual Framework and Overview
ACL 2020
Integrating Ethics into the NLP Curriculum
ACL 2020
Proceedings of the Third Workshop on Natural Language Processing and Computational Social Science
NAACL 2019
Hey Siri. Ok Google. Alexa: A topic modeling of user reviews for smart speakers
EMNLP 2019
Dense Node Representation for Geolocation
EMNLP 2019
Identifying Linguistic Areas for Geolocation
EMNLP 2019
Geolocation with Attention-Based Multitask Learning Models
EMNLP 2019
Proceedings of the First Workshop on Aggregating and Analysing Crowdsourced Annotations for NLP
EMNLP 2019
Women’s Syntactic Resilience and Men’s Grammatical Luck: Gender-Bias in Part-of-Speech Tagging and Dependency Parsing
ACL 2019
Capturing Regional Variation with Distributed Place Representations and Geographic Retrofitting
EMNLP 2018
Proceedings of the Second ACL Workshop on Ethics in Natural Language Processing
NAACL 2018
Increasing In-Class Similarity by Retrofitting Embeddings with Demographic Information
EMNLP 2018
The Social and the Neural Network: How to Make Natural Language Processing about People again
NAACL 2018
Predicting News Headline Popularity with Syntactic and Semantic Knowledge Using Multi-Task Learning
EMNLP 2018
Multitask Learning for Mental Health Conditions with Limited Social Media Data
EACL 2017
The Social Impact of Natural Language Processing
ACL 2016
The Enemy in Your Own Camp: How Well Can We Detect Statistically-Generated Fake Reviews – An Adversarial Study
ACL 2016
Learning a POS tagger for AAVE-like language
NAACL 2016
SemEval-2016 Task 10: Detecting Minimal Semantic Units and their Meanings (DiMSUM)
SEMEVAL 2016
Putting Sarcasm Detection into Context: The Effects of Class Imbalance and Manual Labelling on Supervised Machine Classification of Twitter Conversations
ACL 2016
Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter
NAACL 2016
If all you have is a bit of the Bible: Learning POS taggers for truly low-resource languages
IJCNLP 2015
Cross-lingual syntactic variation over age and gender
CONLL 2015
Demographic Factors Improve Classification Performance
IJCNLP 2015
Tagging Performance Correlates with Author Age
IJCNLP 2015
Demographic Factors Improve Classification Performance
ACL 2015
Mining for unambiguous instances to adapt part-of-speech taggers to new domains
NAACL 2015
Tagging Performance Correlates with Author Age
ACL 2015
If all you have is a bit of the Bible: Learning POS taggers for truly low-resource languages
ACL 2015
The Rating Game: Sentiment Rating Reproducibility from Text
EMNLP 2015
Experiments with crowdsourced re-annotation of a POS tagging data set
ACL 2014
How Well can We Learn Interpretable Entity Types from Text?
ACL 2014
Linguistically debatable or just plain wrong?
ACL 2014
What’s in a p-value in NLP?
CONLL 2014
Copenhagen-Malmö: Tree Approximations of Semantic Parsing Problems
SEMEVAL 2014
Adapting taggers to Twitter with not-so-distant supervision
COLING 2014
Learning part-of-speech taggers with inter-annotator agreement loss
EACL 2014
Selection Bias, Label Bias, and Bias in Ground Truth
COLING 2014
Learning Whom to Trust with MACE
NAACL 2013
A Walk-Based Semantically Enriched Tree Kernel Over Distributed Word Representations
EMNLP 2013
When Did that Happen? — Linking Events and Relations to Timestamps
EACL 2012
Exploiting Partial Annotations with EM Training
NAACL 2012
Unsupervised Discovery of Domain-Specific Knowledge from Text
ACL 2011
Models and Training for Unsupervised Preposition Sense Disambiguation
ACL 2011
What’s in a Preposition? Dimensions of Sense Disambiguation for an Interesting Word Class
COLING 2010
Disambiguation of Preposition Sense Using Linguistically Motivated Features
NAACL 2009