Gema Ramírez-Sánchez
7 papers · 2020–2025 · 2 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+5 more ↓ Show less ↑
🐣 Hot Topic Early Bird 🧭 Keyword Pioneer 🌍 Conference Polyglot (2) 🏃 Academic Marathon (5) 🐝 Cross-Pollinator (10)
🌉
Interdisciplinary Bridge
🗺️
Taxonomy Completionist
(11)
👥
Mega-Team
(35)
🏆
Keyword Champion
(3)
❓
The Questioner
Conferences
COLING (4)
ACL (3)
Top co-authors
Keywords
parallel corpus
(5)
machine translation
(4)
corpus quality
(3)
multilingual corpus
(2)
web crawling
(2)
multilingual nlp
(2)
downstream task
(1)
data quality
(1)
human evaluation
(1)
multilingual evaluation
(1)
quality assessment
(1)
data filtering
(1)
text corpus
(1)
sentence alignment
(1)
parallel datum
(1)
corpus filtering
(1)
language model pretraining
(1)
spell checking
(1)
multilingual text processing
(1)
web-crawled corpus
(1)
Papers
An Expanded Massive Multilingual Dataset for High-Performance Language Technologies (HPLT)
ACL 2025
Quality Beyond A Glance: Revealing Large Quality Differences Between Web-Crawled Parallel Corpora
COLING 2025
A New Massive Multilingual Dataset for High-Performance Language Technologies
COLING 2024
Do Language Models Care about Text Quality? Evaluating Web-Crawled Corpora across 11 Languages
COLING 2024
FastSpell: The LangId Magic Spell
COLING 2024
Human evaluation of web-crawled parallel corpora for machine translation
ACL 2022
ParaCrawl: Web-Scale Acquisition of Parallel Corpora
ACL 2020