Daniel Deutsch
41 papers · 2018–2026 · 9 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+14 more ↓ Show less ↑
๐ Interdisciplinary Bridge ๐ Academic Marathon (7) ๐ Conference Polyglot (9) ๐ Renaissance Researcher (7) ๐บ๏ธ Taxonomy Completionist (45)
๐บ๏ธ
Taxonomy Completionist
(45)
๐งญ
Keyword Pioneer
๐ฃ
Hot Topic Early Bird
๐
Conference Loyalist
(21)
๐ค
Dynamic Duo
(19)
๐งฌ
Topic Evolution
๐
Keyword Champion
(6)
๐ฅ
Mega-Team
(77)
๐ฌ
Deep Specialist
(20)
๐ฅ
Unstoppable
(8)
โ
The Questioner
(3)
โก
Prolific Year
(8)
๐
Century Club
(39)
๐๏ธ
Keyword Collector
(132)
Conferences
EMNLP (21)
ACL (6)
NAACL (4)
CONLL (2)
EACL (2)
ICML (2)
IJCNLP (2)
AACL (1)
COLING (1)
Top co-authors
Research topics
Keywords
machine translation
(16)
evaluation metric
(11)
text summarization
(7)
human evaluation
(6)
summarization evaluation
(6)
large language model
(5)
quality estimation
(5)
machine translation evaluation
(4)
question answering
(4)
translation evaluation
(4)
automatic metric
(4)
text generation
(4)
annotation quality
(4)
translation quality
(3)
reference-based metrics
(3)
multidimensional quality metrics
(3)
abstractive model
(2)
neural metric
(2)
abstractive summarization
(2)
minimum bayes risk
(2)
Papers
Generating Difficult-to-Translate Texts
EACL 2026
MQM Re-Annotation: A Technique for Collaborative Evaluation of Machine Translation
ACL 2026
WMT24++: Expanding the Language Coverage of WMT24 to 55 Languages & Dialects
ACL 2025
SSA-COMET: Do LLMs Outperform Learned Metrics in Evaluating MT for Under-Resourced African Languages?
EMNLP 2025
Overestimation in LLM Evaluation: A Controlled Large-Scale Study on Data Contaminationโs Impact on Machine Translation
ICML 2025
From Jack of All Trades to Master of One: Specializing LLM-based Autoraters to a Test Set
ICML 2025
MetricX-25 and GemSpanEval: Google Translate Submissions to the WMT25 Evaluation Shared Task
EMNLP 2025
Findings of the WMT25 Shared Task on Automated Translation Evaluation Systems: Linguistic Diversity is Challenging and References Still Help
EMNLP 2025
Donโt Sweat the Small Stuff: Segment-Level Meta-Evaluation Based on Pairwise Difference Correlation
EMNLP 2025
Enhancing Human Evaluation in Machine Translation with Comparative Judgement
ACL 2025
LLMRefine: Pinpointing and Refining Large Language Models via Fine-Grained Actionable Feedback
NAACL 2024
Finding Replicable Human Evaluations via Stable Ranking Probability
NAACL 2024
Are LLMs Breaking MT Metrics? Results of the WMT24 Metrics Shared Task
EMNLP 2024
MetricX-24: The Google Submission to the WMT 2024 Metrics Shared Task
EMNLP 2024
Mitigating Metric Bias in Minimum Bayes Risk Decoding
EMNLP 2024
Beyond Human-Only: Evaluating Human-Machine Collaboration for Collecting High-Quality Translation Data
EMNLP 2024
Improving Statistical Significance in Human Evaluation of Automatic Metrics via Soft Pairwise Accuracy
EMNLP 2024
On the Role of Summary Content Units in Text Summarization Evaluation
NAACL 2024
Training and Meta-Evaluating Machine Translation Evaluation Metrics at the Paragraph Level
EMNLP 2023
A Needle in a Haystack: An Analysis of High-Agreement Workers on MTurk for Summarization
ACL 2023
Incorporating Question Answering-Based Signals into Abstractive Summarization via Salient Span Selection
EACL 2023
Ties Matter: Meta-Evaluating Modern Metrics with Pairwise Accuracy and Tie Calibration
EMNLP 2023
Thereโs No Data like Better Data: Using QE Metrics for MT Data Filtering
EMNLP 2023
Results of WMT23 Metrics Shared Task: Metrics Might Be Guilty but References Are Not Innocent
EMNLP 2023
MetricX-23: The Google Submission to the WMT 2023 Metrics Shared Task
EMNLP 2023
Quality Estimation Using Minimum Bayes Risk
EMNLP 2023
The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics
AACL 2023
The Devil Is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation
EMNLP 2023
The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics
IJCNLP 2023
GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
EMNLP 2022
Re-Examining System-Level Correlations of Automatic Summarization Evaluation Metrics
NAACL 2022
Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics
ACL 2022
On the Limitations of Reference-Free Evaluations of Generated Text
EMNLP 2022
Understanding the Extent to which Content Quality Metrics Measure the Information Quality of Summaries
EMNLP 2021
Understanding the Extent to which Content Quality Metrics Measure the Information Quality of Summaries
CONLL 2021
SacreROUGE: An Open-Source Library for Using and Developing Summarization Evaluation Metrics
EMNLP 2020
Is Killed More Significant than Fled? A Contextual Model for Salient Event Detection
COLING 2020
Summary Cloze: A New Task for Content Selection in Topic-Focused Summarization
EMNLP 2019
A General-Purpose Algorithm for Constrained Sequential Inference
CONLL 2019
Summary Cloze: A New Task for Content Selection in Topic-Focused Summarization
IJCNLP 2019
A Distributional and Orthographic Aggregation Model for English Derivational Morphology
ACL 2018