Jaume Zaragoza-Bernabeu
5 papers · 2020–2025 · 3 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+2 more ↓ Show less ↑
🐝 Cross-Pollinator (12) 🏃 Academic Marathon (5) 🐣 Hot Topic Early Bird 🌍 Conference Polyglot (3) 🌉 Interdisciplinary Bridge
🧭
Keyword Pioneer
👥
Mega-Team
(35)
Conferences
ACL (2)
COLING (2)
EMNLP (1)
Top co-authors
Keywords
parallel corpus
(3)
machine translation
(3)
corpus quality
(2)
multilingual corpus
(2)
language model
(1)
human evaluation
(1)
web crawling
(1)
quality assessment
(1)
text corpus
(1)
parallel datum
(1)
lexical similarity
(1)
parallel corpus filtering
(1)
language model pretraining
(1)
spell checking
(1)
character-level language model
(1)
multilingual text processing
(1)
n-gram saturation
(1)
text classification
(1)
extremely randomised tree
(1)
multilingual nlp
(1)
Papers
An Expanded Massive Multilingual Dataset for High-Performance Language Technologies (HPLT)
ACL 2025
A New Massive Multilingual Dataset for High-Performance Language Technologies
COLING 2024
FastSpell: The LangId Magic Spell
COLING 2024
Human evaluation of web-crawled parallel corpora for machine translation
ACL 2022
Bicleaner at WMT 2020: Universitat d’Alacant-Prompsit’s submission to the parallel corpus filtering shared task
EMNLP 2020