conftrace_

Markus Freitag

62 papers · 2012–2026 · 8 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+14 more ↓

🌍 Conference Polyglot (8) 🏃 Academic Marathon (13) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (12)

🌈 Renaissance Researcher (6) 🐝 Cross-Pollinator (12) 🌍 Conference Polyglot (8) 🏠 Conference Loyalist (37) 🤝 Dynamic Duo (19) 👥 Mega-Team (36) 🔬 Deep Specialist (43) 🏆 Keyword Champion (3) 🗃️ Keyword Collector (187) 📈 Trend Setter ⚡ Prolific Year (13) ❓ The Questioner (4) 🔥 Unstoppable (8) 💎 Century Club (60)

Conferences

EMNLP (37) ACL (9) NAACL (5) ICML (4) ICLR (3) EACL (2) COLING (1) NIPS (1)

Top co-authors

Daniel Deutsch (21) Mara Finkelstein (17) Juraj Juraska (12) Parker Riley (11) George Foster (10) Tom Kocmi (9) Colin Cherry (8) David Vilar (8) Eleftherios Avramidis (8) Ondřej Bojar (6)

Keywords

machine translation (33) human evaluation (14) neural machine translation (11) quality estimation (10) translation evaluation (8) large language model (8) translation quality (8) automatic metric (7) automatic post-editing (5) minimum bayes risk (5) evaluation metric (5) text generation (5) multidimensional quality metrics (4) machine translation evaluation (4) multilingual model (4) multilingual translation (3) annotation quality (3) neural metric (3) metric correlation (3) unsupervised learning (2)

Papers

Generating Difficult-to-Translate Texts EACL 2026 MQM Re-Annotation: A Technique for Collaborative Evaluation of Machine Translation ACL 2026 Learning from others’ mistakes: Finetuning machine translation models with span-level error annotations ICML 2025 Overestimation in LLM Evaluation: A Controlled Large-Scale Study on Data Contamination’s Impact on Machine Translation ICML 2025 From Jack of All Trades to Master of One: Specializing LLM-based Autoraters to a Test Set ICML 2025 Enhancing Human Evaluation in Machine Translation with Comparative Judgement ACL 2025 WMT24++: Expanding the Language Coverage of WMT24 to 55 Languages & Dialects ACL 2025 MetricX-25 and GemSpanEval: Google Translate Submissions to the WMT25 Evaluation Shared Task EMNLP 2025 Google Translate’s Research Submission to WMT2025 EMNLP 2025 Findings of the WMT25 Shared Task on Automated Translation Evaluation Systems: Linguistic Diversity is Challenging and References Still Help EMNLP 2025 Findings of the WMT25 General Machine Translation Shared Task: Time to Stop Evaluating on Easy Test Sets EMNLP 2025 Findings of the WMT25 Multilingual Instruction Shared Task: Persistent Hurdles in Reasoning, Generation, and Evaluation EMNLP 2025 Feeding Two Birds or Favoring One? Adequacy–Fluency Tradeoffs in Evaluation and Meta-Evaluation of Machine Translation EMNLP 2025 Efficient Minimum Bayes Risk Decoding using Low-Rank Matrix Completion Algorithms NIPS 2024 Quality-Aware Translation Models: Efficient Generation and Quality Estimation in a Single Model ACL 2024 Findings of the WMT24 General Machine Translation Shared Task: The LLM Era Is Here but MT Is Not Solved Yet EMNLP 2024 Are LLMs Breaking MT Metrics? Results of the WMT24 Metrics Shared Task EMNLP 2024 Findings of the Quality Estimation Shared Task at WMT 2024: Are LLMs Closing the Gap in QE? EMNLP 2024 MetricX-24: The Google Submission to the WMT 2024 Metrics Shared Task EMNLP 2024 Mitigating Metric Bias in Minimum Bayes Risk Decoding EMNLP 2024 Beyond Human-Only: Evaluating Human-Machine Collaboration for Collecting High-Quality Translation Data EMNLP 2024 Translating Step-by-Step: Decomposing the Translation Process for Improved Translation Quality of Long-Form Texts EMNLP 2024 Introducing the NewsPaLM MBR and QE Dataset: LLM-Generated High-Quality Parallel Data Outperforms Traditional Web-Crawled Data EMNLP 2024 MBR and QE Finetuning: Training-time Distillation of the Best and Most Expensive Decoding Methods ICLR 2024 Finding Replicable Human Evaluations via Stable Ranking Probability NAACL 2024 LLMRefine: Pinpointing and Refining Large Language Models via Fine-Grained Actionable Feedback NAACL 2024 Findings of the 2023 Conference on Machine Translation (WMT23): LLMs Are Here but Not Quite There Yet EMNLP 2023 There’s No Data like Better Data: Using QE Metrics for MT Data Filtering EMNLP 2023 Results of WMT23 Metrics Shared Task: Metrics Might Be Guilty but References Are Not Innocent EMNLP 2023 Findings of the WMT 2023 Shared Task on Automatic Post-Editing EMNLP 2023 MetricX-23: The Google Submission to the WMT 2023 Metrics Shared Task EMNLP 2023 Quality Estimation Using Minimum Bayes Risk EMNLP 2023 Training and Meta-Evaluating Machine Translation Evaluation Metrics at the Paragraph Level EMNLP 2023 The Devil Is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation EMNLP 2023 Scaling Laws for Multilingual Neural Machine Translation ICML 2023 Prompting PaLM for Translation: Assessing Strategies and Performance ACL 2023 Language models are multilingual chain-of-thought reasoners ICLR 2023 INSTRUCTSCORE: Towards Explainable Text Generation Evaluation with Automatic Feedback EMNLP 2023 Ties Matter: Meta-Evaluating Modern Metrics with Pairwise Accuracy and Tie Calibration EMNLP 2023 Epsilon Sampling Rocks: Investigating Sampling Strategies for Minimum Bayes Risk Decoding for Machine Translation EMNLP 2023 Original or Translated? A Causal Analysis of the Impact of Translationese on Machine Translation Performance NAACL 2022 Scaling Laws for Neural Machine Translation ICLR 2022 Findings of the WMT 2022 Shared Task on Automatic Post-Editing EMNLP 2022 Results of WMT22 Metrics Shared Task: Stop Using BLEU – Neural Metrics Are Better and More Robust EMNLP 2022 Toward More Effective Human Evaluation for Machine Translation ACL 2022 A Natural Diet: Towards Improving Naturalness of Machine Translation Output ACL 2022 On Systematic Style Differences between Unsupervised and Supervised MT and an Application for High-Resource Machine Translation NAACL 2022 Assessing Reference-Free Peer Evaluation for Machine Translation NAACL 2021 Findings of the 2021 Conference on Machine Translation (WMT21) EMNLP 2021 Results of the WMT21 Metrics Shared Task: Evaluating Metrics with Expert-based Human Evaluations on TED and News Domain EMNLP 2021 Learning to Evaluate Translation Beyond English: BLEURT Submissions to the WMT Metrics 2020 Shared Task EMNLP 2020 Human-Paraphrased References Improve Neural Machine Translation EMNLP 2020 BLEU might be Guilty but References are not Innocent EMNLP 2020 Findings of the WMT 2020 Shared Task on Automatic Post-Editing EMNLP 2020 Results of the WMT20 Metrics Shared Task EMNLP 2020 Complete Multilingual Neural Machine Translation EMNLP 2020 KoBE: Knowledge-Based Machine Translation Evaluation EMNLP 2020 Translationese as a Language in “Multilingual” NMT ACL 2020 APE at Scale and Its Implications on MT Evaluation Biases ACL 2019 Unsupervised Natural Language Generation with Denoising Autoencoders EMNLP 2018 Jane: Open Source Machine Translation System Combination EACL 2014 Jane 2: Open Source Phrase-based and Hierarchical Statistical Machine Translation COLING 2012