Maurice Weber
5 papers · 2022–2025 · 3 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+2 more ↓ Show less ↑
π Conference Polyglot (3) π Renaissance Researcher (6) π Interdisciplinary Bridge πΊοΈ Taxonomy Completionist (18) π§ Keyword Pioneer
π£
Hot Topic Early Bird
π
Cross-Pollinator
(15)
Conferences
NIPS (3)
EMNLP (1)
ICLR (1)
Top co-authors
Keywords
large language model
(2)
convex optimization
(1)
machine translation
(1)
multilingual nlp
(1)
cross-lingual transfer
(1)
data annotation
(1)
document understanding
(1)
web corpus
(1)
continued pretraining
(1)
corpus creation
(1)
distributional robustness
(1)
distributionally robust optimization
(1)
model training
(1)
data quality
(1)
data curation
(1)
multilingual language model
(1)
language model pretraining
(1)
annotation pipeline
(1)
text extraction
(1)
document layout
(1)
Papers
Multilingual Language Model Pretraining using Machine-translated Data
EMNLP 2025
Scaling Instruction-tuned LLMs to Million-token Contexts via Hierarchical Synthetic Data Generation
ICLR 2025
RedPajama: an Open Dataset for Training Large Language Models
NIPS 2024
WordScape: a Pipeline to extract multilingual, visually rich Documents with Layout Annotations from Web Crawl Data
NIPS 2023
Certifying Some Distributional Fairness with Subpopulation Decomposition
NIPS 2022