Markus Freitag
62 papers · 2012–2026 · 8 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+14 more ↓ Show less ↑
🌍 Conference Polyglot (8) 🏃 Academic Marathon (13) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (12)
🌈
Renaissance Researcher
(6)
🐝
Cross-Pollinator
(12)
🌍
Conference Polyglot
(8)
🏠
Conference Loyalist
(37)
🤝
Dynamic Duo
(19)
👥
Mega-Team
(36)
🔬
Deep Specialist
(43)
🏆
Keyword Champion
(3)
🗃️
Keyword Collector
(187)
📈
Trend Setter
⚡
Prolific Year
(13)
❓
The Questioner
(4)
🔥
Unstoppable
(8)
💎
Century Club
(60)
Conferences
EMNLP (37)
ACL (9)
NAACL (5)
ICML (4)
ICLR (3)
EACL (2)
COLING (1)
NIPS (1)
Top co-authors
Keywords
machine translation
(33)
human evaluation
(14)
neural machine translation
(11)
quality estimation
(10)
translation evaluation
(8)
large language model
(8)
translation quality
(8)
automatic metric
(7)
automatic post-editing
(5)
minimum bayes risk
(5)
evaluation metric
(5)
text generation
(5)
multidimensional quality metrics
(4)
machine translation evaluation
(4)
multilingual model
(4)
multilingual translation
(3)
annotation quality
(3)
neural metric
(3)
metric correlation
(3)
unsupervised learning
(2)
Papers
Generating Difficult-to-Translate Texts
EACL 2026
MQM Re-Annotation: A Technique for Collaborative Evaluation of Machine Translation
ACL 2026
Learning from others’ mistakes: Finetuning machine translation models with span-level error annotations
ICML 2025
Overestimation in LLM Evaluation: A Controlled Large-Scale Study on Data Contamination’s Impact on Machine Translation
ICML 2025
From Jack of All Trades to Master of One: Specializing LLM-based Autoraters to a Test Set
ICML 2025
Enhancing Human Evaluation in Machine Translation with Comparative Judgement
ACL 2025
WMT24++: Expanding the Language Coverage of WMT24 to 55 Languages & Dialects
ACL 2025
MetricX-25 and GemSpanEval: Google Translate Submissions to the WMT25 Evaluation Shared Task
EMNLP 2025
Google Translate’s Research Submission to WMT2025
EMNLP 2025
Findings of the WMT25 Shared Task on Automated Translation Evaluation Systems: Linguistic Diversity is Challenging and References Still Help
EMNLP 2025
Findings of the WMT25 General Machine Translation Shared Task: Time to Stop Evaluating on Easy Test Sets
EMNLP 2025
Findings of the WMT25 Multilingual Instruction Shared Task: Persistent Hurdles in Reasoning, Generation, and Evaluation
EMNLP 2025
Feeding Two Birds or Favoring One? Adequacy–Fluency Tradeoffs in Evaluation and Meta-Evaluation of Machine Translation
EMNLP 2025
Efficient Minimum Bayes Risk Decoding using Low-Rank Matrix Completion Algorithms
NIPS 2024
Quality-Aware Translation Models: Efficient Generation and Quality Estimation in a Single Model
ACL 2024
Findings of the WMT24 General Machine Translation Shared Task: The LLM Era Is Here but MT Is Not Solved Yet
EMNLP 2024
Are LLMs Breaking MT Metrics? Results of the WMT24 Metrics Shared Task
EMNLP 2024
Findings of the Quality Estimation Shared Task at WMT 2024: Are LLMs Closing the Gap in QE?
EMNLP 2024
MetricX-24: The Google Submission to the WMT 2024 Metrics Shared Task
EMNLP 2024
Mitigating Metric Bias in Minimum Bayes Risk Decoding
EMNLP 2024
Beyond Human-Only: Evaluating Human-Machine Collaboration for Collecting High-Quality Translation Data
EMNLP 2024
Translating Step-by-Step: Decomposing the Translation Process for Improved Translation Quality of Long-Form Texts
EMNLP 2024
Introducing the NewsPaLM MBR and QE Dataset: LLM-Generated High-Quality Parallel Data Outperforms Traditional Web-Crawled Data
EMNLP 2024
MBR and QE Finetuning: Training-time Distillation of the Best and Most Expensive Decoding Methods
ICLR 2024
Finding Replicable Human Evaluations via Stable Ranking Probability
NAACL 2024
LLMRefine: Pinpointing and Refining Large Language Models via Fine-Grained Actionable Feedback
NAACL 2024
Findings of the 2023 Conference on Machine Translation (WMT23): LLMs Are Here but Not Quite There Yet
EMNLP 2023
There’s No Data like Better Data: Using QE Metrics for MT Data Filtering
EMNLP 2023
Results of WMT23 Metrics Shared Task: Metrics Might Be Guilty but References Are Not Innocent
EMNLP 2023
Findings of the WMT 2023 Shared Task on Automatic Post-Editing
EMNLP 2023
MetricX-23: The Google Submission to the WMT 2023 Metrics Shared Task
EMNLP 2023
Quality Estimation Using Minimum Bayes Risk
EMNLP 2023
Training and Meta-Evaluating Machine Translation Evaluation Metrics at the Paragraph Level
EMNLP 2023
The Devil Is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation
EMNLP 2023
Scaling Laws for Multilingual Neural Machine Translation
ICML 2023
Prompting PaLM for Translation: Assessing Strategies and Performance
ACL 2023
Language models are multilingual chain-of-thought reasoners
ICLR 2023
INSTRUCTSCORE: Towards Explainable Text Generation Evaluation with Automatic Feedback
EMNLP 2023
Ties Matter: Meta-Evaluating Modern Metrics with Pairwise Accuracy and Tie Calibration
EMNLP 2023
Epsilon Sampling Rocks: Investigating Sampling Strategies for Minimum Bayes Risk Decoding for Machine Translation
EMNLP 2023
Original or Translated? A Causal Analysis of the Impact of Translationese on Machine Translation Performance
NAACL 2022
Scaling Laws for Neural Machine Translation
ICLR 2022
Findings of the WMT 2022 Shared Task on Automatic Post-Editing
EMNLP 2022
Results of WMT22 Metrics Shared Task: Stop Using BLEU – Neural Metrics Are Better and More Robust
EMNLP 2022
Toward More Effective Human Evaluation for Machine Translation
ACL 2022
A Natural Diet: Towards Improving Naturalness of Machine Translation Output
ACL 2022
On Systematic Style Differences between Unsupervised and Supervised MT and an Application for High-Resource Machine Translation
NAACL 2022
Assessing Reference-Free Peer Evaluation for Machine Translation
NAACL 2021
Findings of the 2021 Conference on Machine Translation (WMT21)
EMNLP 2021
Results of the WMT21 Metrics Shared Task: Evaluating Metrics with Expert-based Human Evaluations on TED and News Domain
EMNLP 2021
Learning to Evaluate Translation Beyond English: BLEURT Submissions to the WMT Metrics 2020 Shared Task
EMNLP 2020
Human-Paraphrased References Improve Neural Machine Translation
EMNLP 2020
BLEU might be Guilty but References are not Innocent
EMNLP 2020
Findings of the WMT 2020 Shared Task on Automatic Post-Editing
EMNLP 2020
Results of the WMT20 Metrics Shared Task
EMNLP 2020
Complete Multilingual Neural Machine Translation
EMNLP 2020
KoBE: Knowledge-Based Machine Translation Evaluation
EMNLP 2020
Translationese as a Language in “Multilingual” NMT
ACL 2020
APE at Scale and Its Implications on MT Evaluation Biases
ACL 2019
Unsupervised Natural Language Generation with Denoising Autoencoders
EMNLP 2018
Jane: Open Source Machine Translation System Combination
EACL 2014
Jane 2: Open Source Phrase-based and Hierarchical Statistical Machine Translation
COLING 2012