Thibault Clerice
6 papers · 2024–2026 · 4 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+1 more ↓ Show less ↑
🌍 Conference Polyglot (2) 🌈 Renaissance Researcher (5) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (14)
❓
The Questioner
Conferences
EACL (2)
EMNLP (2)
ACL (1)
COLING (1)
Top co-authors
Research topics
Keywords
text classification
(3)
language identification
(3)
corpus linguistics
(1)
historical linguistics
(1)
corpus creation
(1)
prior probability
(1)
support vector machine
(1)
data mining
(1)
human annotation
(1)
sentence classification
(1)
bert fine-tuning
(1)
character n-gram
(1)
dialect identification
(1)
multilingual corpus
(1)
hierarchical attention network
(1)
semantic classification
(1)
corpus building
(1)
creole language
(1)
language contact
(1)
colonial france
(1)
Papers
OcWikiDialects: A Wikipedia Dataset With Rich Metadata for Occitan Dialect Identification
EACL 2026
How Should We Model the Probability of a Language?
EACL 2026
CommonLID: Re-evaluating State-of-the-Art Language Identification Performance on Web Data
ACL 2026
Identifying Rare Languages in Common Crawl Data is a Needles-in-a-Haystack Problem
EMNLP 2025
Detecting Sexual Content at the Sentence Level in First Millennium Latin Texts
COLING 2024
Molyé: A Corpus-based Approach to Language Contact in Colonial France
EMNLP 2024