Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Keywords
parallel corpus
365 papers
Explore in graph
Also known as
PDC
Co-occurring keywords
machine translation
(2472)
neural machine translation
(2310)
low-resource language
(2234)
multilingual nlp
(1423)
corpus filtering
(41)
sentence alignment
(57)
cross-lingual transfer
(1468)
data augmentation
(3037)
multilingual corpus
(100)
multilingual model
(920)
Papers
Curated Datasets and Neural Models for Machine Translation of Informal Registers between Mayan and Spanish Vernaculars
NAACL 2024
GUIDE: Creating Semantic Domain Dictionaries for Low-Resource Languages
EACL 2024
Language Pivoting from Parallel Corpora for Word Sense Disambiguation of Historical Languages: A Case Study on Latin
COLING 2024
Evaluation Dataset for Japanese Medical Text Simplification
NAACL 2024
Massively Multilingual Token-Based Typology Using the Parallel Bible Corpus
COLING 2024
Expanding the FLORES+ Multilingual Benchmark with Translations for Aragonese, Aranese, Asturian, and Valencian
EMNLP 2024
A Bit of a Problem: Measurement Disparities in Dataset Sizes across Languages
COLING 2024
ParaNames 1.0: Creating an Entity Name Corpus for 400+ Languages Using Wikidata
COLING 2024
A Shocking Amount of the Web is Machine Translated: Insights from Multi-Way Parallelism
ACL 2024
The Challenges of Creating a Parallel Multilingual Hate Speech Corpus: An Exploration
COLING 2024
AGE: Amharic, Ge’ez and English Parallel Dataset
ACL 2024
Italian-Ligurian Machine Translation in Its Cultural Context
COLING 2024
Samayik: A Benchmark and Dataset for English-Sanskrit Translation
COLING 2024
The KIND Dataset: A Social Collaboration Approach for Nuanced Dialect Data Collection
EACL 2024
Exploring Word Formation Trends in Written, Spoken, Translated and Interpreted European Parliament Data – A Case Study on Initialisms in English and German
COLING 2024
SciNews: From Scholarly Complexities to Public Narratives – a Dataset for Scientific News Report Generation
COLING 2024
Findings of WMT 2024 Shared Task on Low-Resource Indic Languages Translation
EMNLP 2024
FFSTC: Fongbe to French Speech Translation Corpus
COLING 2024
EthioMT: Parallel Corpus for Low-resource Ethiopian Languages
COLING 2024
Recovering document annotations for sentence-level bitext
ACL 2024
GTNC: A Many-To-One Dataset of Google Translations from NewsCrawl
EACL 2024
NAIST-SIC-Aligned: An Aligned English-Japanese Simultaneous Interpretation Corpus
COLING 2024
Improving Vietnamese-English Medical Machine Translation
COLING 2024
The Parallel Corpus of Russian and Ruska Romani Languages
ACL 2024
Towards Semantic Interoperability: Parallel Corpora as Linked Data Incorporating Named Entity Linking
COLING 2024
<
1
2
3
4
5
…
15
>