Daniel Deutsch

41 papers · 2018–2026 · 9 conferences · across top CS/AI conferences

Achievements

+14 more ↓

🌉 Interdisciplinary Bridge 🏃 Academic Marathon (7) 🌍 Conference Polyglot (9) 🌈 Renaissance Researcher (7) 🗺️ Taxonomy Completionist (45)

🗺️ Taxonomy Completionist (45) 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🏠 Conference Loyalist (21) 🤝 Dynamic Duo (19) 🧬 Topic Evolution 🏆 Keyword Champion (6) 👥 Mega-Team (77) 🔬 Deep Specialist (20) 🔥 Unstoppable (8) ❓ The Questioner (3) ⚡ Prolific Year (8) 💎 Century Club (39) 🗃️ Keyword Collector (132)

Conferences

EMNLP (21) ACL (6) NAACL (4) CONLL (2) EACL (2) ICML (2) IJCNLP (2) AACL (1) COLING (1)

Top co-authors

Markus Freitag (21) Dan Roth (12) Mara Finkelstein (11) Juraj Juraska (11) Parker Riley (8) Rotem Dror (4) Brian Thompson (4) Geza Kovacs (4) Jiayi Wang (3) Eleftheria Briakou (3)

Research topics

Applications (1)

Keywords

machine translation (16) evaluation metric (11) text summarization (7) human evaluation (6) summarization evaluation (6) large language model (5) quality estimation (5) machine translation evaluation (4) question answering (4) translation evaluation (4) automatic metric (4) text generation (4) annotation quality (4) translation quality (3) reference-based metrics (3) multidimensional quality metrics (3) abstractive model (2) neural metric (2) abstractive summarization (2) minimum bayes risk (2)

Papers

Generating Difficult-to-Translate Texts EACL 2026 MQM Re-Annotation: A Technique for Collaborative Evaluation of Machine Translation ACL 2026 WMT24++: Expanding the Language Coverage of WMT24 to 55 Languages & Dialects ACL 2025 SSA-COMET: Do LLMs Outperform Learned Metrics in Evaluating MT for Under-Resourced African Languages? EMNLP 2025 Overestimation in LLM Evaluation: A Controlled Large-Scale Study on Data Contamination’s Impact on Machine Translation ICML 2025 From Jack of All Trades to Master of One: Specializing LLM-based Autoraters to a Test Set ICML 2025 MetricX-25 and GemSpanEval: Google Translate Submissions to the WMT25 Evaluation Shared Task EMNLP 2025 Findings of the WMT25 Shared Task on Automated Translation Evaluation Systems: Linguistic Diversity is Challenging and References Still Help EMNLP 2025 Don’t Sweat the Small Stuff: Segment-Level Meta-Evaluation Based on Pairwise Difference Correlation EMNLP 2025 Enhancing Human Evaluation in Machine Translation with Comparative Judgement ACL 2025 LLMRefine: Pinpointing and Refining Large Language Models via Fine-Grained Actionable Feedback NAACL 2024 Finding Replicable Human Evaluations via Stable Ranking Probability NAACL 2024 Are LLMs Breaking MT Metrics? Results of the WMT24 Metrics Shared Task EMNLP 2024 MetricX-24: The Google Submission to the WMT 2024 Metrics Shared Task EMNLP 2024 Mitigating Metric Bias in Minimum Bayes Risk Decoding EMNLP 2024 Beyond Human-Only: Evaluating Human-Machine Collaboration for Collecting High-Quality Translation Data EMNLP 2024 Improving Statistical Significance in Human Evaluation of Automatic Metrics via Soft Pairwise Accuracy EMNLP 2024 On the Role of Summary Content Units in Text Summarization Evaluation NAACL 2024 Training and Meta-Evaluating Machine Translation Evaluation Metrics at the Paragraph Level EMNLP 2023 A Needle in a Haystack: An Analysis of High-Agreement Workers on MTurk for Summarization ACL 2023 Incorporating Question Answering-Based Signals into Abstractive Summarization via Salient Span Selection EACL 2023 Ties Matter: Meta-Evaluating Modern Metrics with Pairwise Accuracy and Tie Calibration EMNLP 2023 There’s No Data like Better Data: Using QE Metrics for MT Data Filtering EMNLP 2023 Results of WMT23 Metrics Shared Task: Metrics Might Be Guilty but References Are Not Innocent EMNLP 2023 MetricX-23: The Google Submission to the WMT 2023 Metrics Shared Task EMNLP 2023 Quality Estimation Using Minimum Bayes Risk EMNLP 2023 The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics AACL 2023 The Devil Is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation EMNLP 2023 The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics IJCNLP 2023 GEMv2: Multilingual NLG Benchmarking in a Single Line of Code EMNLP 2022 Re-Examining System-Level Correlations of Automatic Summarization Evaluation Metrics NAACL 2022 Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics ACL 2022 On the Limitations of Reference-Free Evaluations of Generated Text EMNLP 2022 Understanding the Extent to which Content Quality Metrics Measure the Information Quality of Summaries EMNLP 2021 Understanding the Extent to which Content Quality Metrics Measure the Information Quality of Summaries CONLL 2021 SacreROUGE: An Open-Source Library for Using and Developing Summarization Evaluation Metrics EMNLP 2020 Is Killed More Significant than Fled? A Contextual Model for Salient Event Detection COLING 2020 Summary Cloze: A New Task for Content Selection in Topic-Focused Summarization EMNLP 2019 A General-Purpose Algorithm for Constrained Sequential Inference CONLL 2019 Summary Cloze: A New Task for Content Selection in Topic-Focused Summarization IJCNLP 2019 A Distributional and Orthographic Aggregation Model for English Derivational Morphology ACL 2018