Marta Bañón
6 papers · 2018–2025 · 3 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+3 more ↓ Show less ↑
🐣 Hot Topic Early Bird 🌍 Conference Polyglot (3) 🏃 Academic Marathon (7) 🧭 Keyword Pioneer 🐝 Cross-Pollinator (10)
🌉
Interdisciplinary Bridge
🗺️
Taxonomy Completionist
(12)
👥
Mega-Team
(35)
Conferences
ACL (3)
COLING (2)
EMNLP (1)
Top co-authors
Keywords
parallel corpus
(5)
machine translation
(4)
multilingual corpus
(2)
corpus quality
(2)
web crawling
(2)
language model
(1)
data selection
(1)
human evaluation
(1)
quality assessment
(1)
data filtering
(1)
text corpus
(1)
sentence alignment
(1)
automatic classifier
(1)
parallel datum
(1)
parallel corpus filtering
(1)
corpus filtering
(1)
language model pretraining
(1)
spell checking
(1)
multilingual text processing
(1)
mutual translation
(1)
Papers
An Expanded Massive Multilingual Dataset for High-Performance Language Technologies (HPLT)
ACL 2025
A New Massive Multilingual Dataset for High-Performance Language Technologies
COLING 2024
FastSpell: The LangId Magic Spell
COLING 2024
Human evaluation of web-crawled parallel corpora for machine translation
ACL 2022
ParaCrawl: Web-Scale Acquisition of Parallel Corpora
ACL 2020
Prompsit’s submission to WMT 2018 Parallel Corpus Filtering shared task
EMNLP 2018