Su Lin Blodgett

25 papers · 2016–2026 · 6 conferences · across top CS/AI conferences

Achievements

+13 more ↓

🏃 Academic Marathon (9) 🌍 Conference Polyglot (6) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (6)

🌈 Renaissance Researcher (6) 🌍 Conference Polyglot (6) 🏃 Academic Marathon (9) 👥 Mega-Team (20) 🤝 Dynamic Duo (10) 🔬 Deep Specialist (11) 🧬 Topic Evolution 🏆 Keyword Champion (4) 🗃️ Keyword Collector (107) ⚡ Prolific Year (6) 🔥 Unstoppable (6) 💎 Century Club (24) ❓ The Questioner

Conferences

ACL (12) EMNLP (5) NAACL (4) IJCNLP (2) AAAI (1) ICML (1)

Top co-authors

Alexandra Olteanu (10) Hanna Wallach (9) Hal Daume III (5) Brendan O’Connor (4) Solon Barocas (3) Emily Sheng (3) Yulia Tsvetkov (2) Eve Fleisig (2) Anjalie Field (2) Ziang Xiao (2)

Keywords

natural language processing (5) nlp evaluation (4) measurement modeling (4) stereotyping detection (3) responsible ai (3) bias evaluation (3) bias detection (3) human-centered evaluation (2) dependency parsing (2) measurement model (2) text generation (2) natural language generation (2) measurement validity (2) racial bia (2) coreference resolution (2) representational harm (2) fairness benchmark (2) text summarization (1) semantic analysis (1) data augmentation (1)

Papers

Illusions of the Gold Standard: A Large-scale Analysis of Human Evaluation Protocols for Long-form Text Generation ACL 2026 Understanding and Meeting Practitioner Needs When Measuring Representational Harms Caused by LLM-Based Systems ACL 2025 Dehumanizing Machines: Mitigating Anthropomorphic Behaviors in Text Generation Systems ACL 2025 Position: Evaluating Generative AI Systems Is a Social Science Measurement Challenge ICML 2025 Understanding the Impacts of Language Technologies’ Performance Disparities on African American Language Speakers ACL 2024 Human-Centered Evaluation of Language Technologies EMNLP 2024 The Perspectivist Paradigm Shift: Assumptions and Challenges of Capturing Human Labels NAACL 2024 “One-Size-Fits-All”? Examining Expectations around What Constitute “Fair” or “Good” NLG System Behaviors NAACL 2024 ECBD: Evidence-Centered Benchmark Design for NLP ACL 2024 Metrics for What, Metrics for Whom: Assessing Actionability of Bias Evaluation Metrics in NLP EMNLP 2024 Taxonomizing and Measuring Representational Harms: A Look at Image Tagging AAAI 2023 FairPrism: Evaluating Fairness-Related Harms in Text Generation ACL 2023 This prompt is measuring <mask>: evaluating bias evaluation in language models ACL 2023 It Takes Two to Tango: Navigating Conceptualizations of NLP Tasks and Measurements of Performance ACL 2023 Responsible AI Considerations in Text Summarization Research: A Review of Current Practices EMNLP 2023 Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and Their Implications NAACL 2022 Examining Political Rhetoric with Epistemic Stance Detection EMNLP 2022 Stereotyping Norwegian Salmon: An Inventory of Pitfalls in Fairness Benchmark Datasets ACL 2021 A Survey of Race, Racism, and Anti-Racism in NLP IJCNLP 2021 Stereotyping Norwegian Salmon: An Inventory of Pitfalls in Fairness Benchmark Datasets IJCNLP 2021 A Survey of Race, Racism, and Anti-Racism in NLP ACL 2021 Language (Technology) is Power: A Critical Survey of “Bias” in NLP ACL 2020 Twitter Universal Dependency Parsing for African-American and Mainstream American English ACL 2018 Monte Carlo Syntax Marginals for Exploring and Using Dependency Parses NAACL 2018 Demographic Dialectal Variation in Social Media: A Case Study of African-American English EMNLP 2016