conftrace_

Dirk Hovy

121 papers · 2009–2026 · 9 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓
+16 more ↓ 🌍 Conference Polyglot (9) 🐣 Hot Topic Early Bird 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🏃 Academic Marathon (16)
🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🏃 Academic Marathon (16) 🏠 Conference Loyalist (43) 🐺 Lone Wolf (5) 🤝 Dynamic Duo (20) 🧬 Topic Evolution 👥 Mega-Team (42) 🏆 Keyword Champion (4) 🔬 Deep Specialist (30) 🗃️ Keyword Collector (352) The Questioner (13) Prolific Year (9) 💎 Century Club (115) 🔥 Unstoppable (17) 📈 Trend Setter

Conferences

ACL (44) EMNLP (23) EACL (16) NAACL (16) COLING (9) IJCNLP (7) SEMEVAL (3) CONLL (2) AAAI (1)

Papers

Do Large Language Models Adapt to Language Variation across Socioeconomic Status? EACL 2026 PATS: Personality-Aware Teaching Strategies with Large Language Model Tutors EACL 2026 Can Reasoning Help Large Language Models Capture Human Annotator Disagreement? EACL 2026 The Pluralistic Moral Gap: Understanding Moral Judgment and Value Differences between Humans and Large Language Models EACL 2026 Exploring Subjective Tasks in Farsi: A Survey Analysis and Evaluation of Language Model EACL 2026 Responsible Evaluation of AI for Mental Health ACL 2026 The AI Gap: How Socioeconomic Status Affects Language Technology Interactions ACL 2025 Beyond Demographics: Fine-tuning Large Language Models to Predict Individuals’ Subjective Text Perceptions ACL 2025 Consistency is Key: Disentangling Label Variation in Natural Language Processing with Intra-Annotator Agreement EMNLP 2025 Social Intelligence in the Age of LLMs NAACL 2025 Co-DETECT: Collaborative Discovery of Edge Cases in Text Classification EMNLP 2025 SafetyPrompts: A Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety AAAI 2025 No for Some, Yes for Others: Persona Prompts and Other Sources of False Refusal in Language Models EMNLP 2025 Biased Tales: Cultural and Topic Bias in Generating Children’s Stories EMNLP 2025 Principled Personas: Defining and Measuring the Intended Effects of Persona Prompting on Task Performance EMNLP 2025 Educators’ Perceptions of Large Language Models as Tutors: Comparing Human and AI Tutors in a Blind Text-only Setting ACL 2025 Beyond Flesch-Kincaid: Prompt-based Metrics Improve Difficulty Classification of Educational Texts NAACL 2024 Angry Men, Sad Women: Large Language Models Reflect Gendered Stereotypes in Emotion Attribution ACL 2024 Classist Tools: Social Class Correlates with Performance in NLP ACL 2024 Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models ACL 2024 Narratives at Conflict: Computational Analysis of News Framing in Multilingual Disinformation Campaigns ACL 2024 Compromesso! Italian Many-Shot Jailbreaks undermine the safety of Large Language Models ACL 2024 “My Answer is C”: First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models ACL 2024 Comparing Pre-trained Human Language Models: Is it Better with Human Context as Groups, Individual Traits, or Both? ACL 2024 Explaining Speech Classification Models via Word-Level Audio Segments and Paralinguistic Features EACL 2024 XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models NAACL 2024 Emotion Analysis in NLP: Trends, Gaps and Roadmap for Future Directions COLING 2024 DADIT: A Dataset for Demographic Classification of Italian Twitter Users and a Comparison of Prediction Methods COLING 2024 Twists, Humps, and Pebbles: Multilingual Speech Recognition Models Exhibit Gender Performance Gaps EMNLP 2024 Divine LLaMAs: Bias, Stereotypes, Stigmatization, and Emotion Representation of Religion in Large Language Models EMNLP 2024 Wisdom of Instruction-Tuned Language Model Crowds. Exploring Model Label Variation COLING 2024 Impoverished Language Technology: The Lack of (Social) Class in NLP COLING 2024 Can Demographic Factors Improve Text Classification? Revisiting Demographic Adaptation in the Age of Transformers EACL 2023 MilaNLP at SemEval-2023 Task 10: Ensembling Domain-Adapted and Regularized Pretrained Language Models for Robust Sexism Detection SEMEVAL 2023 Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP EACL 2023 What about “em”? How Commercial Machine Translation Fails to Handle (Neo-)Pronouns ACL 2023 The Ecological Fallacy in Annotation: Modeling Human Label Variation goes beyond Sociodemographics ACL 2023 The State of Profanity Obfuscation in Natural Language Processing Scientific Publications ACL 2023 Temporal and Second Language Influence on Intra-Annotator Agreement and Stability in Hate Speech Labelling ACL 2023 MilaNLP at SemEval-2023 Task 10: Ensembling Domain-Adapted and Regularized Pretrained Language Models for Robust Sexism Detection ACL 2023 Respectful or Toxic? Using Zero-Shot Learning with Language Models to Detect Hate Speech ACL 2023 XLM-EMO: Multilingual Emotion Prediction in Social Media Text ACL 2022 Benchmarking Post-Hoc Interpretability Approaches for Transformer-based Misogyny Detection ACL 2022 Language Invariant Properties in Natural Language Processing ACL 2022 Measuring Harmful Sentence Completion in Language Models for LGBTQIA+ Individuals ACL 2022 Pipelines for Social Bias Testing of Large Language Models ACL 2022 Entropy-based Attention Regularization Frees Unintended Bias Mitigation from Lists ACL 2022 Hard and Soft Evaluation of NLP models with BOOtSTrap SAmpling - BooStSa ACL 2022 Welcome to the Modern World of Pronouns: Identity-Inclusive Natural Language Processing beyond Gender COLING 2022 SafetyKit: First Aid for Measuring Safety in Open-domain Conversational Systems ACL 2022 Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks NAACL 2022 Bridging Fairness and Environmental Sustainability in Natural Language Processing EMNLP 2022 Data-Efficient Strategies for Expanding Hate Speech Detection into Under-Resourced Languages EMNLP 2022 Twitter-Demographer: A Flow-based Tool to Enrich Twitter Data EMNLP 2022 “It’s Not Just Hate”: A Multi-Dimensional Perspective on Detecting Harmful Speech Online EMNLP 2022 SocioProbe: What, When, and Where Language Models Learn about Sociodemographics EMNLP 2022 The Importance of Modeling Social Factors of Language: Theory and Practice NAACL 2021 Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence ACL 2021 “We will Reduce Taxes” - Identifying Election Pledges with Language Models ACL 2021 On the Gap between Adoption and Understanding in NLP ACL 2021 We Need to Consider Disagreement in Evaluation ACL 2021 Cross-lingual Contextualized Topic Models with Zero-shot Learning EACL 2021 BERTective: Language Models and Contextual Information for Deception Detection EACL 2021 Universal Joy A Data Set and Results for Classifying Emotions Across Languages EACL 2021 FEEL-IT: Emotion and Sentiment Classification for the Italian Language EACL 2021 MilaNLP @ WASSA: Does BERT Feel Sad When You Cry? EACL 2021 Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence IJCNLP 2021 “We will Reduce Taxes” - Identifying Election Pledges with Language Models IJCNLP 2021 On the Gap between Adoption and Understanding in NLP IJCNLP 2021 We Need to Consider Disagreement in Evaluation IJCNLP 2021 HONEST: Measuring Hurtful Sentence Completion in Language Models NAACL 2021 Beyond Black & White: Leveraging Annotator Disagreement via Soft-Label Multi-Task Learning NAACL 2021 A Report on the VarDial Evaluation Campaign 2020 COLING 2020 Helpful or Hierarchical? Predicting the Communicative Strategies of Chat Participants, and their Impact on Success EMNLP 2020 “You Sound Just Like Your Father” Commercial Machine Translation Systems Include Stylistic Biases ACL 2020 Predictive Biases in Natural Language Processing Models: A Conceptual Framework and Overview ACL 2020 Integrating Ethics into the NLP Curriculum ACL 2020 Proceedings of the Third Workshop on Natural Language Processing and Computational Social Science NAACL 2019 Hey Siri. Ok Google. Alexa: A topic modeling of user reviews for smart speakers EMNLP 2019 Dense Node Representation for Geolocation EMNLP 2019 Identifying Linguistic Areas for Geolocation EMNLP 2019 Geolocation with Attention-Based Multitask Learning Models EMNLP 2019 Proceedings of the First Workshop on Aggregating and Analysing Crowdsourced Annotations for NLP EMNLP 2019 Women’s Syntactic Resilience and Men’s Grammatical Luck: Gender-Bias in Part-of-Speech Tagging and Dependency Parsing ACL 2019 Capturing Regional Variation with Distributed Place Representations and Geographic Retrofitting EMNLP 2018 Proceedings of the Second ACL Workshop on Ethics in Natural Language Processing NAACL 2018 Increasing In-Class Similarity by Retrofitting Embeddings with Demographic Information EMNLP 2018 The Social and the Neural Network: How to Make Natural Language Processing about People again NAACL 2018 Predicting News Headline Popularity with Syntactic and Semantic Knowledge Using Multi-Task Learning EMNLP 2018 Multitask Learning for Mental Health Conditions with Limited Social Media Data EACL 2017 The Social Impact of Natural Language Processing ACL 2016 The Enemy in Your Own Camp: How Well Can We Detect Statistically-Generated Fake Reviews – An Adversarial Study ACL 2016 Learning a POS tagger for AAVE-like language NAACL 2016 SemEval-2016 Task 10: Detecting Minimal Semantic Units and their Meanings (DiMSUM) SEMEVAL 2016 Putting Sarcasm Detection into Context: The Effects of Class Imbalance and Manual Labelling on Supervised Machine Classification of Twitter Conversations ACL 2016 Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter NAACL 2016 If all you have is a bit of the Bible: Learning POS taggers for truly low-resource languages IJCNLP 2015 Cross-lingual syntactic variation over age and gender CONLL 2015 Demographic Factors Improve Classification Performance IJCNLP 2015 Tagging Performance Correlates with Author Age IJCNLP 2015 Demographic Factors Improve Classification Performance ACL 2015 Mining for unambiguous instances to adapt part-of-speech taggers to new domains NAACL 2015 Tagging Performance Correlates with Author Age ACL 2015 If all you have is a bit of the Bible: Learning POS taggers for truly low-resource languages ACL 2015 The Rating Game: Sentiment Rating Reproducibility from Text EMNLP 2015 Experiments with crowdsourced re-annotation of a POS tagging data set ACL 2014 How Well can We Learn Interpretable Entity Types from Text? ACL 2014 Linguistically debatable or just plain wrong? ACL 2014 What’s in a p-value in NLP? CONLL 2014 Copenhagen-Malmö: Tree Approximations of Semantic Parsing Problems SEMEVAL 2014 Adapting taggers to Twitter with not-so-distant supervision COLING 2014 Learning part-of-speech taggers with inter-annotator agreement loss EACL 2014 Selection Bias, Label Bias, and Bias in Ground Truth COLING 2014 Learning Whom to Trust with MACE NAACL 2013 A Walk-Based Semantically Enriched Tree Kernel Over Distributed Word Representations EMNLP 2013 When Did that Happen? — Linking Events and Relations to Timestamps EACL 2012 Exploiting Partial Annotations with EM Training NAACL 2012 Unsupervised Discovery of Domain-Specific Knowledge from Text ACL 2011 Models and Training for Unsupervised Preposition Sense Disambiguation ACL 2011 What’s in a Preposition? Dimensions of Sense Disambiguation for an Interesting Word Class COLING 2010 Disambiguation of Preposition Sense Using Linguistically Motivated Features NAACL 2009