Benoît Sagot
61 papers · 2006–2026 · 10 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+12 more ↓ Show less ↑
🏃 Academic Marathon (19) 🌍 Conference Polyglot (10) 🧭 Keyword Pioneer 🗺️ Taxonomy Completionist (10) 🐝 Cross-Pollinator (8)
🐝
Cross-Pollinator
(8)
🌈
Renaissance Researcher
(6)
🌉
Interdisciplinary Bridge
🤝
Dynamic Duo
(16)
🏆
Keyword Champion
(2)
🔬
Deep Specialist
(15)
⚡
Prolific Year
(6)
📈
Trend Setter
💎
Century Club
(57)
🗃️
Keyword Collector
(206)
🔥
Unstoppable
(9)
❓
The Questioner
(7)
Conferences
EMNLP (18)
ACL (15)
COLING (7)
NAACL (6)
EACL (5)
INTERSPEECH (3)
CONLL (2)
ICLR (2)
IJCNLP (2)
AACL (1)
Top co-authors
Research topics
Keywords
low-resource language
(8)
machine translation
(8)
zero-shot learning
(5)
transfer learning
(5)
cross-lingual transfer
(5)
large language model
(5)
text classification
(5)
multilingual nlp
(4)
in-context learning
(4)
dependency parsing
(4)
language identification
(3)
part-of-speech tagging
(3)
multilingual language model
(3)
multilingual corpus
(3)
language model
(3)
user-generated content
(3)
model architecture
(2)
multimodal learning
(2)
sample efficiency
(2)
parallel corpus
(2)
Papers
Cross-lingual and cross-country approaches to argument component detection: a comparative study.
EACL 2026
CommonLID: Re-evaluating State-of-the-Art Language Identification Performance on Web Data
ACL 2026
How Should We Model the Probability of a Language?
EACL 2026
OcWikiDialects: A Wikipedia Dataset With Rich Metadata for Occitan Dialect Identification
EACL 2026
Towards Zero-Shot Multimodal Machine Translation
NAACL 2025
RoCS-MT v2 at WMT 2025: Robust Challenge Set for Machine Translation
EMNLP 2025
A French Version of the OLDI Seed Corpus
EMNLP 2025
In-Context Example Selection via Similarity Search Improves Low-Resource Machine Translation
NAACL 2025
Explicit Learning and the LLM in Machine Translation
EMNLP 2025
ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance
IJCNLP 2025
ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance
AACL 2025
Identifying Rare Languages in Common Crawl Data is a Needles-in-a-Haystack Problem
EMNLP 2025
Compositional Translation: A Novel LLM-based Approach for Low-resource Machine Translation
EMNLP 2025
TopXGen: Topic-Diverse Parallel Data Generation for Low-Resource Machine Translation
EMNLP 2025
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
ACL 2025
Tree of Problems: Improving structured problem solving with compositionality
EMNLP 2024
PatentEval: Understanding Errors in Patent Generation
NAACL 2024
Headless Language Models: Learning without Predicting with Contrastive Weight Tying
ICLR 2024
From Text to Source: Results in Detecting Large Language Model-Generated Content
COLING 2024
Making Sentence Embeddings Robust to User-Generated Content
COLING 2024
On the Scaling Laws of Geographical Representation in Language Models
COLING 2024
When Your Cousin Has the Right Connections: Unsupervised Bilingual Lexicon Induction for Related Data-Imbalanced Languages
COLING 2024
Anisotropy Is Inherent to Self-Attention in Transformers
EACL 2024
Molyé: A Corpus-based Approach to Language Contact in Colonial France
EMNLP 2024
SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations
ACL 2023
XLS-R fine-tuning on noisy word boundaries for unsupervised speech segmentation into words
EMNLP 2023
Tackling Ambiguity with Images: Improved Multimodal Machine Translation and Contrastive Evaluation
ACL 2023
Modular Speech-to-Text Translation for Zero-Shot Cross-Modal Transfer
INTERSPEECH 2023
RoCS-MT: Robustness Challenge Set for Machine Translation
EMNLP 2023
Generative Spoken Language Model based on continuous word-sized audio tokens
EMNLP 2023
Neural Agents Struggle to Take Turns in Bidirectional Emergent Communication
ICLR 2023
Data-Efficient French Language Modeling with CamemBERTa
ACL 2023
MANTa: Efficient Gradient-Based Tokenization for End-to-End Robust Language Modeling
EMNLP 2022
Probing Multilingual Cognate Prediction Models
ACL 2022
T-Modules: Translation Modules for Zero-Shot Cross-Modal Machine Translation
EMNLP 2022
The MRL 2022 Shared Task on Multilingual Clause-level Morphology
EMNLP 2022
Inria-ALMAnaCH at WMT 2022: Does Transcription Help Cross-Script Machine Translation?
EMNLP 2022
Speech Sequence Embeddings using Nearest Neighbors Contrastive Learning
INTERSPEECH 2022
Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering
EMNLP 2021
First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT
EACL 2021
Can Cognate Prediction Be Modelled as a Low-Resource Machine Translation Task?
IJCNLP 2021
Can Cognate Prediction Be Modelled as a Low-Resource Machine Translation Task?
ACL 2021
When Being Unseen from mBERT is just the Beginning: Handling New Languages With Multilingual Language Models
NAACL 2021
Can Character-based Language Models Improve Downstream Task Performances In Low-Resource And Noisy Language Scenarios?
EMNLP 2021
Building a User-Generated Content North-African Arabizi Treebank: Tackling Hell
ACL 2020
CamemBERT: a Tasty French Language Model
ACL 2020
A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages
ACL 2020
ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations
ACL 2020
Evaluating the Reliability of Acoustic Speech Embeddings
INTERSPEECH 2020
Enhancing BERT for Lexical Normalization
EMNLP 2019
What Does BERT Learn about the Structure of Language?
ACL 2019
ELMoLex: Connecting ELMo and Lexicon Features for Dependency Parsing
CONLL 2018
The ParisNLP entry at the ConLL UD Shared Task 2017: A Tale of a #ParsingTragedy
CONLL 2017
Enforcing Subcategorization Constraints in a Parser Using Sub-parses Recombining
NAACL 2013
The French Social Media Bank: a Treebank of Noisy User Generated Content
COLING 2012
Unsupervized Word Segmentation: the Case for Mandarin Chinese
ACL 2012
Optimal Rank Reduction for Linear Context-Free Rewriting Systems with Fan-Out Two
ACL 2010
MICA: A Probabilistic Dependency Parser Based on Tree Insertion Grammars (Application Note)
NAACL 2009
Computer Aided Correction and Extension of a Syntactic Wide-Coverage Lexicon
COLING 2008
Error Mining in Parsing Results
ACL 2006
Error Mining in Parsing Results
COLING 2006