2019
EMNLP
EMNLP 2019
CoSSAT: Code-Switched Speech Annotation Tool
Abstract
AbstractCode-switching refers to the alternation of two or more languages in a conversation or utterance and is common in multilingual communities across the world. Building code-switched speech and natural language processing systems are challenging due to the lack of annotated speech and text data. We present a speech annotation interface CoSSAT, which helps annotators transcribe code-switched speech faster, more easily and more accurately than a traditional interface, by displaying candidate words from monolingual speech recognizers. We conduct a user study on the transcription of Hindi-English code-switched speech with 10 annotators and describe quantitative and qualitative results.
🌉
Interdisciplinary Bridge
- Natural Language Processing and Speech & Audio
🧭
Keyword Pioneer
- transcription interface
🐣
Hot Topic Early Bird
- multilingual speech
🐝
Cross-Pollinator
- Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio