Philipp Koehn
149 papers · 2001–2025 · 10 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+17 more ↓ Show less ↑
🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (16) 🌈 Renaissance Researcher (5) 🐣 Hot Topic Early Bird
🐣
Hot Topic Early Bird
🌉
Interdisciplinary Bridge
🧭
Keyword Pioneer
🌟
Keyword Trendsetter Combo
(3)
🏠
Conference Loyalist
(67)
🐺
Lone Wolf
(6)
👥
Mega-Team
(36)
🏆
Keyword Champion
(12)
🤝
Dynamic Duo
(17)
🔬
Deep Specialist
(11)
🔥
Unstoppable
(12)
❓
The Questioner
(3)
📈
Trend Setter
🗃️
Keyword Collector
(59)
💎
Century Club
(149)
🚀
Conference Pioneer
⚡
Prolific Year
(9)
Conferences
EMNLP (67)
ACL (37)
NAACL (13)
IJCNLP (10)
EACL (9)
CONLL (6)
ICLR (3)
COLING (2)
AACL (1)
NIPS (1)
Top co-authors
Research topics
Keywords
machine translation
(45)
neural machine translation
(28)
low-resource language
(17)
parallel corpus
(12)
domain adaptation
(10)
transfer learning
(9)
translation quality
(9)
human evaluation
(8)
continued training
(6)
parallel corpus filtering
(6)
document-level translation
(5)
low-resource translation
(5)
corpus filtering
(5)
large language model
(5)
translation evaluation
(4)
sentence alignment
(4)
multilingual translation
(4)
news translation
(4)
transformer architecture
(3)
multilingual model
(3)
Papers
Streaming Sequence Transduction through Dynamic Compression
ACL 2025
X-ALMA: Plug & Play Modules and Adaptive Rejection for Quality Translation at Scale
ICLR 2025
Findings of the WMT 2025 Shared Task of the Open Language Data Initiative
EMNLP 2025
Findings of the WMT25 Multilingual Instruction Shared Task: Persistent Hurdles in Reasoning, Generation, and Evaluation
EMNLP 2025
Findings of the WMT25 General Machine Translation Shared Task: Time to Stop Evaluating on Easy Test Sets
EMNLP 2025
HiMATE: A Hierarchical Multi-Agent Framework for Machine Translation Evaluation
EMNLP 2025
Seeing is Believing: Emotion-Aware Audio-Visual Language Modeling for Expressive Speech Generation
EMNLP 2025
Speech Vecalign: an Embedding-based Method for Aligning Parallel Speech Documents
EMNLP 2025
Learn and Unlearn: Addressing Misinformation in Multilingual LLMs
EMNLP 2025
DiffNorm: Self-Supervised Normalization for Non-autoregressive Speech-to-speech Translation
NIPS 2024
The Language Barrier: Dissecting Safety Challenges of LLMs in Multilingual Contexts
ACL 2024
Recovering document annotations for sentence-level bitext
ACL 2024
Speech Data from Radio Broadcasts for Low Resource Languages
ACL 2024
Findings of the WMT24 General Machine Translation Shared Task: The LLM Era Is Here but MT Is Not Solved Yet
EMNLP 2024
Findings of the WMT 2024 Shared Task of the Open Language Data Initiative
EMNLP 2024
Findings of the WMT 2024 Shared Task on Discourse-Level Literary Translation
EMNLP 2024
Benchmarking Visually-Situated Translation of Text in Natural Images
EMNLP 2024
Neural Methods for Aligning Large-Scale Parallel Corpora from the Web for South and East Asian Languages
EMNLP 2024
Error Norm Truncation: Robust Training in the Presence of Data Noise for Text Generation Models
ICLR 2024
Where are you from? Geolocating Speech and Applications to Language Identification
NAACL 2024
Narrowing the Gap between Zero- and Few-shot Machine Translation by Matching Styles
NAACL 2024
Pointer-Generator Networks for Low-Resource Machine Translation: Don’t Copy That!
NAACL 2024
Findings of the Word-Level AutoCompletion Shared Task in WMT 2023
EMNLP 2023
Findings of the WMT 2023 Shared Task on Parallel Data Curation
EMNLP 2023
Findings of the WMT 2023 Shared Task on Discourse-Level Literary Translation: A Fresh Orb in the Cosmos of LLMs
EMNLP 2023
Findings of the 2023 Conference on Machine Translation (WMT23): LLMs Are Here but Not Quite There Yet
EMNLP 2023
Multilingual Representation Distillation with Contrastive Learning
EACL 2023
Small Data, Big Impact: Leveraging Minimal Data for Effective Machine Translation
ACL 2023
Condensing Multilingual Knowledge with Lightweight Language-Specific Modules
EMNLP 2023
Multilingual Pixel Representations for Translation and Effective Cross-lingual Transfer
EMNLP 2023
Machine Translation with Large Language Models: Prompting, Few-shot Learning, and Fine-tuning with QLoRA
EMNLP 2023
Bilingual Lexicon Induction for Low-Resource Languages using Graph Matching via Optimal Transport
EMNLP 2022
Alternative Input Signals Ease Transfer in Multilingual Machine Translation
ACL 2022
Learn To Remember: Transformer with Recurrent Memory for Document-Level Machine Translation
NAACL 2022
Contrastive Clustering to Mine Pseudo Parallel Data for Unsupervised Translation
ICLR 2022
The Importance of Being Parameters: An Intra-Distillation Method for Serious Gains
EMNLP 2022
Toward the Limitation of Code-Switching in Cross-Lingual Transfer
EMNLP 2022
IsoVec: Controlling the Relative Isomorphism of Word Embedding Spaces
EMNLP 2022
Data Selection Curriculum for Neural Machine Translation
EMNLP 2022
Findings of the 2022 Conference on Machine Translation (WMT22)
EMNLP 2022
Findings of the Word-Level AutoCompletion Shared Task in WMT 2022
EMNLP 2022
Findings of the 2021 Conference on Machine Translation (WMT21)
EMNLP 2021
Adapting High-resource NMT Models to Translate Low-resource Related Languages without Parallel Data
ACL 2021
Adapting High-resource NMT Models to Translate Low-resource Related Languages without Parallel Data
IJCNLP 2021
Evaluating Saliency Methods for Neural Language Models
NAACL 2021
Zero-Shot Cross-Lingual Dependency Parsing through Contextual Embedding Transformation
EACL 2021
Learning Feature Weights using Reward Modeling for Denoising Parallel Corpora
EMNLP 2021
The JHU-Microsoft Submission for WMT21 Quality Estimation Shared Task
EMNLP 2021
Findings of the WMT Shared Task on Machine Translation Using Terminologies
EMNLP 2021
Facebook AI’s WMT21 News Translation Task Submission
EMNLP 2021
An Analysis of Euclidean vs. Graph-Based Framing for Bilingual Lexicon Induction from Word Embedding Spaces
EMNLP 2021
XLEnt: Mining a Large Cross-lingual Entity Dataset with Lexical-Semantic-Phonetic Word Alignment
EMNLP 2021
Levenshtein Training for Word-level Quality Estimation
EMNLP 2021
Findings of the WMT 2020 Shared Task on Machine Translation Robustness
EMNLP 2020
An exploratory approach to the Parallel Corpus Filtering shared task WMT20
EMNLP 2020
When Does Unsupervised Machine Translation Work?
EMNLP 2020
Dual Conditional Cross Entropy Scores and LASER Similarity Scores for the WMT20 Parallel Corpus Filtering Shared Task
EMNLP 2020
Findings of the WMT 2020 Shared Task on Parallel Corpus Filtering and Alignment
EMNLP 2020
SimulMT to SimulST: Adapting Simultaneous Text Translation to End-to-End Simultaneous Speech Translation
AACL 2020
Statistical Power and Translationese in Machine Translation Evaluation
EMNLP 2020
ParaCrawl: Web-Scale Acquisition of Parallel Corpora
ACL 2020
Simulated multiple reference training improves low-resource machine translation
EMNLP 2020
CCAligned: A Massive Collection of Cross-Lingual Web-Document Pairs
EMNLP 2020
Exploiting Sentence Order in Document Alignment
EMNLP 2020
TICO-19: the Translation Initiative for COvid-19
EMNLP 2020
Findings of the 2020 Conference on Machine Translation (WMT20)
EMNLP 2020
Simple Construction of Mixed-Language Texts for Vocabulary Learning
ACL 2019
Saliency-driven Word Alignment Interpretation for Neural Machine Translation
ACL 2019
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)
ACL 2019
Findings of the 2019 Conference on Machine Translation (WMT19)
ACL 2019
Findings of the First Shared Task on Machine Translation Robustness
ACL 2019
Johns Hopkins University Submission for WMT News Translation Task
ACL 2019
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)
ACL 2019
Findings of the WMT 2019 Shared Task on Parallel Corpus Filtering for Low-Resource Conditions
ACL 2019
Low-Resource Corpus Filtering Using Multilingual Sentence Embeddings
ACL 2019
Vecalign: Improved Sentence Alignment in Linear Time and Space
EMNLP 2019
HABLex: Human Annotated Bilingual Lexicons for Experiments in Machine Translation
EMNLP 2019
The FLORES Evaluation Datasets for Low-Resource Machine Translation: Nepali–English and Sinhala–English
EMNLP 2019
Spelling-Aware Construction of Macaronic Texts for Teaching Foreign-Language Vocabulary
EMNLP 2019
Parallelizable Stack Long Short-Term Memory
NAACL 2019
Overcoming Catastrophic Forgetting During Domain Adaptation of Neural Machine Translation
NAACL 2019
Vecalign: Improved Sentence Alignment in Linear Time and Space
IJCNLP 2019
HABLex: Human Annotated Bilingual Lexicons for Experiments in Machine Translation
IJCNLP 2019
The FLORES Evaluation Datasets for Low-Resource Machine Translation: Nepali–English and Sinhala–English
IJCNLP 2019
Spelling-Aware Construction of Macaronic Texts for Teaching Foreign-Language Vocabulary
IJCNLP 2019
De-Mixing Sentiment from Code-Mixed Text
ACL 2019
Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers)
ACL 2019
The JHU Machine Translation Systems for WMT 2018
EMNLP 2018
Document-Level Adaptation for Neural Machine Translation
ACL 2018
Regularized Training Objective for Continued Training for Domain Adaptation in Neural Machine Translation
ACL 2018
Iterative Back-Translation for Neural Machine Translation
ACL 2018
Proceedings of the Third Conference on Machine Translation: Research Papers
EMNLP 2018
Freezing Subnetworks to Analyze Domain Adaptation in Neural Machine Translation
EMNLP 2018
On the Impact of Various Types of Noise on Neural Machine Translation
ACL 2018
Context and Copying in Neural Machine Translation
EMNLP 2018
The JHU Parallel Corpus Filtering Systems for WMT 2018
EMNLP 2018
Findings of the WMT 2018 Shared Task on Parallel Corpus Filtering
EMNLP 2018
Proceedings of the Third Conference on Machine Translation: Shared Task Papers
EMNLP 2018
Findings of the 2018 Conference on Machine Translation (WMT18)
EMNLP 2018
Neural Lattice Search for Domain Adaptation in Machine Translation
IJCNLP 2017
CADET: Computer Assisted Discovery Extraction and Translation
IJCNLP 2017
Knowledge Tracing in Sequential Learning of Inflected Vocabulary
CONLL 2017
Zipporah: a Fast and Scalable Data Cleaning System for Noisy Web-Crawled Parallel Corpora
EMNLP 2017
Analyzing Learner Understanding of Novel L2 Vocabulary
CONLL 2016
User Modeling in Language Learning with Macaronic Texts
ACL 2016
Creating Interactive Macaronic Interfaces for Language Learning
ACL 2016
Computer Aided Translation
ACL 2016
Syntax-Based Statistical Machine Translation
EMNLP 2014
The MateCat Tool
COLING 2014
Integrating an Unsupervised Transliteration Model into Statistical Machine Translation
EACL 2014
Investigating the Usefulness of Generalized Word Representations in SMT
COLING 2014
Dynamic Topic Adaptation for Phrase-based MT
EACL 2014
CASMACAT: A Computer-assisted Translation Workbench
EACL 2014
Refinements to Interactive Translation Prediction Based on Search Graphs
ACL 2014
Can Markov Models Over Minimal Translation Units Help Phrase-Based SMT?
ACL 2013
Dirt Cheap Web-Scale Parallel Text from the Common Crawl
ACL 2013
Scalable Modified Kneser-Ney Language Model Estimation
ACL 2013
Grouping Language Model Boundary Words to Speed K–Best Extraction from Hypergraphs
NAACL 2013
Learning to Prune: Context-Sensitive Pruning for Syntactic MT
ACL 2013
Language Model Rest Costs and Space-Efficient Storage
CONLL 2012
Language Model Rest Costs and Space-Efficient Storage
EMNLP 2012
Soft Dependency Constraints for Reordering in Hierarchical Phrase-Based Translation
EMNLP 2011
Enabling Monolingual Translators: Post-Editing vs. Options
NAACL 2010
A Web-Based Interactive Computer Aided Translation Tool
ACL 2009
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing
EMNLP 2009
Word Lattices for Multi-Source Translation
EACL 2009
Improving Mid-Range Re-Ordering Using Templates of Factors
EACL 2009
Monte Carlo inference and maximization for phrase-based translation
CONLL 2009
Topics in Statistical Machine Translation
ACL 2009
A Web-Based Interactive Computer Aided Translation Tool
IJCNLP 2009
Topics in Statistical Machine Translation
IJCNLP 2009
Large and Diverse Language Models for Statistical Machine Translation
IJCNLP 2008
Enriching Morphologically Poor Languages for Statistical Machine Translation
ACL 2008
Predicting Success in Machine Translation
EMNLP 2008
Factored Translation Models
EMNLP 2007
Chinese Syntactic Reordering for Statistical Machine Translation
EMNLP 2007
Factored Translation Models
CONLL 2007
Chinese Syntactic Reordering for Statistical Machine Translation
CONLL 2007
Moses: Open Source Toolkit for Statistical Machine Translation
ACL 2007
Improved Statistical Machine Translation Using Paraphrases
NAACL 2006
Re-evaluating the Role of Bleu in Machine Translation Research
EACL 2006
Clause Restructuring for Statistical Machine Translation
ACL 2005
Statistical Significance Tests for Machine Translation Evaluation
EMNLP 2004
Feature-Rich Statistical Translation of Noun Phrases
ACL 2003
Statistical Phrase-Based Translation
NAACL 2003
Desparately Seeking Cebuano
NAACL 2003
What’s New in Statistical Machine Translation
NAACL 2003
Empirical Methods for Compound Splitting
EACL 2003
Knowledge Sources for Word-Level Translation Models
EMNLP 2001