Thomas Bauwens
6 papers · 2024–2026 · 4 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+2 more ↓ Show less ↑
π Conference Polyglot (3) π Interdisciplinary Bridge π§ Keyword Pioneer πΊοΈ Taxonomy Completionist (15) π Cross-Pollinator (12)
π
Renaissance Researcher
(6)
β
The Questioner
Conferences
EMNLP (3)
ACL (1)
EACL (1)
NAACL (1)
Top co-authors
Keywords
subword tokenization
(2)
causal language modeling
(2)
morphological complexity
(2)
language model
(2)
text representation
(1)
linguistic knowledge
(1)
markov chain monte carlo
(1)
markov model
(1)
pre-trained language model
(1)
perplexity analysis
(1)
confounding factor
(1)
morphological analysis
(1)
byte-pair encoding
(1)
visual capability
(1)
subword tokenisation
(1)
path counting
(1)
morphological alignment
(1)
semantic abstraction
(1)
intrinsic evaluation
(1)
subword regularization
(1)
Papers
ReBPE: Iteratively Improving the Internal Structure of a Structured Tokeniser by Mining its Internal Structure
EACL 2026
GRaMPa: Subword Regularisation by Skewing Uniform Segmentation Distributions with an Efficient Path-counting Markov Model
ACL 2025
Confounding Factors in Relating Model Performance to Morphology
EMNLP 2025
How Can We Relate Language Modeling to Morphology?
EMNLP 2025
Pixology: Probing the Linguistic and Visual Capabilities of Pixel-based Language Models
EMNLP 2024
BPE-knockout: Pruning Pre-existing BPE Tokenisers with Backwards-compatible Morphological Semi-supervision
NAACL 2024