François Yvon

85 papers · 2005–2026 · 8 conferences · across top CS/AI conferences

Achievements

+17 more ↓

🌍 Conference Polyglot (8) 🏃 Academic Marathon (20) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (13)

🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌈 Renaissance Researcher (9) 🌟 Keyword Trendsetter Combo (4) 🏠 Conference Loyalist (29) 👥 Mega-Team (20) 🤝 Dynamic Duo (14) 🔬 Deep Specialist (24) 🧬 Topic Evolution 🏆 Keyword Champion (2) 📈 Trend Setter ⚡ Prolific Year (5) 🚀 Conference Pioneer ❓ The Questioner (3) 🔥 Unstoppable (12) 💎 Century Club (82) 🗃️ Keyword Collector (294)

Conferences

EMNLP (29) ACL (17) COLING (14) NAACL (11) EACL (7) CONLL (3) INTERSPEECH (3) NIPS (1)

Top co-authors

Guillaume Wisniewski (14) Amir Hossein Kargaran (10) Josep Crego (9) Hinrich Schuetze (8) Jitao Xu (8) Hinrich Schütze (7) Alexandre Allauzen (6) Lauriane Aufrant (6) Ayyoob Imani (5) Masoud Jalili Sabet (5)

Keywords

machine translation (16) neural machine translation (10) large language model (9) low-resource language (9) dependency parsing (6) multilingual nlp (6) word alignment (5) word segmentation (4) language documentation (4) domain adaptation (4) dynamic oracle (3) part-of-speech tagging (3) language identification (3) sequence labeling (3) conditional random field (3) in-context learning (3) text generation (3) representation learning (3) parallel corpus (3) retrieval-augmented generation (3)

Papers

AdaptBPE: From General Purpose to Specialized Tokenizers EACL 2026 The GDN-CC Dataset: Automatic Corpus Clarification for AI-enhanced Democratic Citizen Consultations ACL 2026 Polyglots or Multitudes? Multilingual LLM Answers to Value-laden Multiple-Choice Questions EACL 2026 How Sampling Affects the Detectability of Machine-written texts: A Comprehensive Study EMNLP 2025 Tracing Multilingual Factual Knowledge Acquisition in Pretraining EMNLP 2025 An Interdisciplinary Approach to Human-Centered Machine Translation EMNLP 2025 On Relation-Specific Neurons in Large Language Models EMNLP 2025 MOSAIC at GENAI Detection Task 3 : Zero-Shot Detection Using an Ensemble of Models COLING 2025 Understanding In-Context Machine Translation for Low-Resource Languages: A Case Study on Manchu ACL 2025 MockConf: A Student Interpretation Dataset: Analysis, Word- and Span-level Alignment and Baselines ACL 2025 MOSAIC: Multiple Observers Spotting AI Content ACL 2025 How Programming Concepts and Neurons Are Shared in Code Language Models ACL 2025 MEXA: Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment ACL 2025 Prompting LLMs: Length Control for Isometric Machine Translation ACL 2025 How Transliterations Improve Crosslingual Alignment COLING 2025 Unlike “Likely”, “Unlike” is Unlikely: BPE-based Segmentation hurts Morphological Derivations in LLMs COLING 2025 Towards the Machine Translation of Scientific Neologisms COLING 2025 Self-Retrieval from Distant Contexts for Document-Level Machine Translation EMNLP 2025 GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages NIPS 2024 MaskLID: Code-Switching Language Identification through Iterative Masking ACL 2024 GlotScript: A Resource and Tool for Low Resource Writing System Identification COLING 2024 Invited Talk: The Way Towards Massively Multilingual Language Models COLING 2024 Retrieving Examples from Memory for Retrieval Augmented Neural Machine Translation: A Systematic Comparison NAACL 2024 Towards Multilingual Interlinear Morphological Glossing EMNLP 2023 Towards Example-Based NMT with Multi-Levenshtein Transformers EMNLP 2023 Structural generalization in COGS: Supertagging is (almost) all you need EMNLP 2023 Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages ACL 2023 Integrating Translation Memories into Non-Autoregressive Machine Translation EACL 2023 Joint Word and Morpheme Segmentation with Bayesian Non-Parametric Models EACL 2023 BiSync: A Bilingual Editor for Synchronized Monolingual Texts ACL 2023 LISN @ SIGMORPHON 2023 Shared Task on Interlinear Glossing ACL 2023 Assessing Word Importance Using Models Trained for Semantic Tasks ACL 2023 GlotLID: Language Identification for Low-Resource Languages EMNLP 2023 Bilingual Synchronization: Restoring Translational Relationships with Editing Operations EMNLP 2022 Latent Group Dropout for Multilingual and Multidomain Machine Translation NAACL 2022 Weakly Supervised Word Segmentation for Computational Language Documentation ACL 2022 Graph Neural Networks for Multiparallel Word Alignment ACL 2022 Joint Generation of Captions and Subtitles with Dual Decoding ACL 2022 Analyzing Gender Translation Errors to Identify Information Flows between the Encoder and Decoder of a NMT System EMNLP 2022 Graph-Based Multilingual Label Propagation for Low-Resource Part-of-Speech Tagging EMNLP 2022 Screening Gender Transfer in Neural Machine Translation EMNLP 2021 Graph Algorithms for Multiparallel Word Alignment EMNLP 2021 One Source, Two Targets: Challenges and Rewards of Dual Decoding EMNLP 2021 LISN @ WMT 2021 EMNLP 2021 Toward Genre Adapted Closed Captioning INTERSPEECH 2021 Can You Traducir This? Machine Translation for Code-Switched Input NAACL 2021 SimAlign: High Quality Word Alignments Without Parallel Training Data Using Static and Contextualized Embeddings EMNLP 2020 Priming Neural Machine Translation EMNLP 2020 A Study of Residual Adapters for Multi-Domain Neural Machine Translation EMNLP 2020 LIMSI @ WMT 2020 EMNLP 2020 How Bad are PoS Tagger in Cross-Corpora Settings? Evaluating Annotation Divergence in the UD Project. NAACL 2019 Measuring text readability with machine comprehension: a pilot study ACL 2019 Using Monolingual Data in Neural Machine Translation: a Systematic Study EMNLP 2018 The WMT’18 Morpheval test suites for English-Czech, English-German, English-Finnish and Turkish-English EMNLP 2018 Quantifying training challenges of dependency parsers COLING 2018 Unsupervised Word Segmentation from Speech with Attention INTERSPEECH 2018 Automatically Selecting the Best Dependency Annotation Design with Dynamic Oracles NAACL 2018 Exploiting Dynamic Oracles to Train Projective Dependency Parsers on Non-Projective Trees NAACL 2018 Fixing Translation Divergences in Parallel Corpora for Neural MT EMNLP 2018 Adaptor Grammars for the Linguist: Word Segmentation Experiments for Very Low-Resource Languages EMNLP 2018 Don’t Stop Me Now! Using Global Dynamic Oracles to Correct Training Biases of Transition-Based Dependency Parsers EACL 2017 LIMSI@CoNLL’17: UD Shared Task CONLL 2017 Learning the Structure of Variable-Order CRFs: a finite-state perspective EMNLP 2017 TransRead: Designing a Bilingual Reading Experience with Machine Translation Technologies NAACL 2016 Preliminary Experiments on Unsupervised Word Discovery in Mboshi INTERSPEECH 2016 Parallel Sentence Compression COLING 2016 Zero-resource Dependency Parsing: Boosting Delexicalized Cross-lingual Transfer with Linguistic Knowledge COLING 2016 Frustratingly Easy Cross-Lingual Transfer for Transition-Based Dependency Parsing NAACL 2016 A Discriminative Training Procedure for Continuous Translation Models EMNLP 2015 Cross-Lingual Part-of-Speech Tagging through Ambiguous Learning EMNLP 2014 Computing Lattice BLEU Oracle Scores for Machine Translation EACL 2012 Measuring the Influence of Long Range Dependencies with Neural Network Language Models NAACL 2012 Continuous Space Translation Models with Neural Networks NAACL 2012 Aligning Bilingual Literary Works: a Pilot Study NAACL 2012 Local lexical adaptation in Machine Translation through triangulation: SMT helping SMT COLING 2010 Improving Reordering with Linguistically Informed Bilingual n-grams COLING 2010 Training Continuous Space Language Models: Some Practical Issues EMNLP 2010 Assessing Phrase-Based Translation Models with Oracle Decoding EMNLP 2010 Practical Very Large Scale CRFs ACL 2010 Improvements in Analogical Learning: Application to Translating Multi-Terms of the Medical Domain EACL 2009 Normalizing SMS: are Two Metaphors Better than One ? COLING 2008 Robust Similarity Measures for Named Entities Matching COLING 2008 Using LDA to detect semantically incoherent documents CONLL 2008 Scaling up Analogical Learning COLING 2008 An Analogical Learner for Morphological Analysis CONLL 2005