conftrace_

Saad Mahamood

11 papers · 2021–2026 · 6 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+6 more ↓

🗺️ Taxonomy Completionist (16) 🌍 Conference Polyglot (6) 🏃 Academic Marathon (5) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer

🌍 Conference Polyglot (6) 🏃 Academic Marathon (5) 👥 Mega-Team (77) 💎 Century Club (10) ❓ The Questioner 🔥 Unstoppable (5)

Conferences

EACL (3) ACL (2) EMNLP (2) NAACL (2) COLING (1) IJCNLP (1)

Top co-authors

Ondřej Dušek (6) Simon Mille (5) João Sedoc (5) Sebastian Gehrmann (5) Yufang Hou (5) Khyathi Raghavi Chandu (4) Dimitra Gkatzia (4) Kaustubh Dhole (3) Vitaly Nikolaev (3) Yixin Liu (3)

Keywords

human evaluation (6) natural language generation (5) evaluation metric (3) summarization evaluation (2) inter-annotator agreement (2) automated metric (2) annotation quality (2) evaluation methodology (2) large language model (2) text summarization (2) user interface (1) reproducibility study (1) end-to-end approach (1) user experience (1) experimental methodology (1) nlp research (1) referring expression generation (1) pyramid evaluation (1) span annotation (1) annotator quality (1)

Papers

LLMs as Span Annotators: A Comparative Study of LLMs and Humans EACL 2026 Lessons from a User Experience Evaluation of NLP Interfaces NAACL 2025 Real-World Summarization: When Evaluation Reaches Its Limits EMNLP 2025 ReproHum #0124-03: Reproducing Human Evaluations of end-to-end approaches for Referring Expression Generation COLING 2024 On the Role of Summary Content Units in Text Summarization Evaluation NAACL 2024 A Needle in a Haystack: An Analysis of High-Agreement Workers on MTurk for Summarization ACL 2023 Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP EACL 2023 GEMv2: Multilingual NLG Benchmarking in a Single Line of Code EMNLP 2022 The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics IJCNLP 2021 The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics ACL 2021 It’s Commonsense, isn’t it? Demystifying Human Evaluations in Commonsense-Enhanced NLG Systems EACL 2021