Teven Le Scao

12 papers · 2020–2025 · 9 conferences · across top CS/AI conferences

Achievements

+8 more ↓

🌈 Renaissance Researcher (5) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (9) 🏃 Academic Marathon (5) 🗺️ Taxonomy Completionist (20)

🐣 Hot Topic Early Bird 🌍 Conference Polyglot (9) 🏃 Academic Marathon (5) 👑 Triple Crown 👥 Mega-Team (54) 💎 Century Club (12) 📈 Trend Setter ❓ The Questioner (3)

Conferences

EMNLP (3) NIPS (2) AACL (1) ACL (1) ICLR (1) ICML (1) IJCNLP (1) JMLR (1) NAACL (1)

Top co-authors

Thomas Wolf (5) Colin Raffel (5) Thomas Wang (5) Niklas Muennighoff (4) Stella Biderman (4) Alexander Rush (4) Victor Sanh (4) Canwen Xu (3) Yacine Jernite (3) Zheng Xin Yong (3)

Research topics

Resources & Methods (1)

Keywords

zero-shot generalization (3) large language model (3) transformer architecture (3) language model (2) data repetition (2) model scaling (2) multilingual language model (2) prompt engineering (2) model pretraining (1) scaling law (1) language model training (1) natural language processing (1) model fine-tuning (1) scaling behavior (1) model training (1) data curation (1) multilingual model (1) token efficiency (1) pretrained model (1) pretraining corpus (1)

Papers

Scaling Data-Constrained Language Models JMLR 2025 Joint Representations of Text and Knowledge Graphs for Retrieval and Evaluation IJCNLP 2023 Scaling Data-Constrained Language Models NIPS 2023 Joint Representations of Text and Knowledge Graphs for Retrieval and Evaluation AACL 2023 Crosslingual Generalization through Multitask Finetuning ACL 2023 The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset NIPS 2022 What Language Model to Train if You Have One Million GPU Hours? EMNLP 2022 Multitask Prompted Training Enables Zero-Shot Task Generalization ICLR 2022 What Language Model Architecture and Pretraining Objective Works Best for Zero-Shot Generalization? ICML 2022 Datasets: A Community Library for Natural Language Processing EMNLP 2021 How many data points is a prompt worth? NAACL 2021 Transformers: State-of-the-Art Natural Language Processing EMNLP 2020