2024 COLING COLING 2024

Leveraging Domain Corpora for Enhanced Terminology: The Case of Estonian-English Remote Sensing Termbase

Abstract

AbstractThis article addresses methodological issues related to developing domain corpora and a terminological database from scratch. We present an ongoing project focused on creating an Estonian-English Remote Sensing Termbase. First, we describe the compilation process of the Estonian Remote Sensing Corpus 2022 , which served as the primary data source for the termbase. The corpus was compiled by crawling the web and adding files using the Corpus Query System Sketch Engine (Kilgarriff et al., 2004). In the next step, we employed the Term Extraction module (Kilgarriff et al., 2014; Fišer et al., 2016; Blahuš et al., 2023) to identify terms, which were subsequently registered in the Estonian Remote Sensing Termbase using the Dictionary Writing System Ekilex (Tavast et al., 2018). For each term, we provided definitions, variants, and usage contexts. In the final stage, remote sensing experts reviewed and edited the terms, their variants, and usage contexts. Finally, we provide insights and outline directions for future work in this area.

🌉 Interdisciplinary Bridge — Computer Vision and Machine Learning
🧭 Keyword Pioneer — domain corpus
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning