conftrace_

Benoît Sagot

61 papers · 2006–2026 · 10 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+12 more ↓

🏃 Academic Marathon (19) 🌍 Conference Polyglot (10) 🧭 Keyword Pioneer 🗺️ Taxonomy Completionist (10) 🐝 Cross-Pollinator (8)

🐝 Cross-Pollinator (8) 🌈 Renaissance Researcher (6) 🌉 Interdisciplinary Bridge 🤝 Dynamic Duo (16) 🏆 Keyword Champion (2) 🔬 Deep Specialist (15) ⚡ Prolific Year (6) 📈 Trend Setter 💎 Century Club (57) 🗃️ Keyword Collector (206) 🔥 Unstoppable (9) ❓ The Questioner (7)

Conferences

EMNLP (18) ACL (15) COLING (7) NAACL (6) EACL (5) INTERSPEECH (3) CONLL (2) ICLR (2) IJCNLP (2) AACL (1)

Top co-authors

Rachel Bawden (17) Djamé Seddah (16) Benjamin Muller (8) Éric de la Clergerie (7) Thibault Clerice (5) Armel Randy Zebaze (5) Emmanuel Dupoux (5) Pedro Ortiz Suarez (5) Rasul Dent (4) Nathan Godey (4)

Research topics

Linguistics (1) Digital Humanities (1)

Keywords

low-resource language (8) machine translation (8) zero-shot learning (5) transfer learning (5) cross-lingual transfer (5) large language model (5) text classification (5) multilingual nlp (4) in-context learning (4) dependency parsing (4) language identification (3) part-of-speech tagging (3) multilingual language model (3) multilingual corpus (3) language model (3) user-generated content (3) model architecture (2) multimodal learning (2) sample efficiency (2) parallel corpus (2)

Papers

Cross-lingual and cross-country approaches to argument component detection: a comparative study. EACL 2026 CommonLID: Re-evaluating State-of-the-Art Language Identification Performance on Web Data ACL 2026 How Should We Model the Probability of a Language? EACL 2026 OcWikiDialects: A Wikipedia Dataset With Rich Metadata for Occitan Dialect Identification EACL 2026 Towards Zero-Shot Multimodal Machine Translation NAACL 2025 RoCS-MT v2 at WMT 2025: Robust Challenge Set for Machine Translation EMNLP 2025 A French Version of the OLDI Seed Corpus EMNLP 2025 In-Context Example Selection via Similarity Search Improves Low-Resource Machine Translation NAACL 2025 Explicit Learning and the LLM in Machine Translation EMNLP 2025 ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance IJCNLP 2025 ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance AACL 2025 Identifying Rare Languages in Common Crawl Data is a Needles-in-a-Haystack Problem EMNLP 2025 Compositional Translation: A Novel LLM-based Approach for Low-resource Machine Translation EMNLP 2025 TopXGen: Topic-Diverse Parallel Data Generation for Low-Resource Machine Translation EMNLP 2025 mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus ACL 2025 Tree of Problems: Improving structured problem solving with compositionality EMNLP 2024 PatentEval: Understanding Errors in Patent Generation NAACL 2024 Headless Language Models: Learning without Predicting with Contrastive Weight Tying ICLR 2024 From Text to Source: Results in Detecting Large Language Model-Generated Content COLING 2024 Making Sentence Embeddings Robust to User-Generated Content COLING 2024 On the Scaling Laws of Geographical Representation in Language Models COLING 2024 When Your Cousin Has the Right Connections: Unsupervised Bilingual Lexicon Induction for Related Data-Imbalanced Languages COLING 2024 Anisotropy Is Inherent to Self-Attention in Transformers EACL 2024 Molyé: A Corpus-based Approach to Language Contact in Colonial France EMNLP 2024 SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations ACL 2023 XLS-R fine-tuning on noisy word boundaries for unsupervised speech segmentation into words EMNLP 2023 Tackling Ambiguity with Images: Improved Multimodal Machine Translation and Contrastive Evaluation ACL 2023 Modular Speech-to-Text Translation for Zero-Shot Cross-Modal Transfer INTERSPEECH 2023 RoCS-MT: Robustness Challenge Set for Machine Translation EMNLP 2023 Generative Spoken Language Model based on continuous word-sized audio tokens EMNLP 2023 Neural Agents Struggle to Take Turns in Bidirectional Emergent Communication ICLR 2023 Data-Efficient French Language Modeling with CamemBERTa ACL 2023 MANTa: Efficient Gradient-Based Tokenization for End-to-End Robust Language Modeling EMNLP 2022 Probing Multilingual Cognate Prediction Models ACL 2022 T-Modules: Translation Modules for Zero-Shot Cross-Modal Machine Translation EMNLP 2022 The MRL 2022 Shared Task on Multilingual Clause-level Morphology EMNLP 2022 Inria-ALMAnaCH at WMT 2022: Does Transcription Help Cross-Script Machine Translation? EMNLP 2022 Speech Sequence Embeddings using Nearest Neighbors Contrastive Learning INTERSPEECH 2022 Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering EMNLP 2021 First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT EACL 2021 Can Cognate Prediction Be Modelled as a Low-Resource Machine Translation Task? IJCNLP 2021 Can Cognate Prediction Be Modelled as a Low-Resource Machine Translation Task? ACL 2021 When Being Unseen from mBERT is just the Beginning: Handling New Languages With Multilingual Language Models NAACL 2021 Can Character-based Language Models Improve Downstream Task Performances In Low-Resource And Noisy Language Scenarios? EMNLP 2021 Building a User-Generated Content North-African Arabizi Treebank: Tackling Hell ACL 2020 CamemBERT: a Tasty French Language Model ACL 2020 A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages ACL 2020 ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations ACL 2020 Evaluating the Reliability of Acoustic Speech Embeddings INTERSPEECH 2020 Enhancing BERT for Lexical Normalization EMNLP 2019 What Does BERT Learn about the Structure of Language? ACL 2019 ELMoLex: Connecting ELMo and Lexicon Features for Dependency Parsing CONLL 2018 The ParisNLP entry at the ConLL UD Shared Task 2017: A Tale of a #ParsingTragedy CONLL 2017 Enforcing Subcategorization Constraints in a Parser Using Sub-parses Recombining NAACL 2013 The French Social Media Bank: a Treebank of Noisy User Generated Content COLING 2012 Unsupervized Word Segmentation: the Case for Mandarin Chinese ACL 2012 Optimal Rank Reduction for Linear Context-Free Rewriting Systems with Fan-Out Two ACL 2010 MICA: A Probabilistic Dependency Parser Based on Tree Insertion Grammars (Application Note) NAACL 2009 Computer Aided Correction and Extension of a Syntactic Wide-Coverage Lexicon COLING 2008 Error Mining in Parsing Results ACL 2006 Error Mining in Parsing Results COLING 2006