conftrace_
2024 INTERSPEECH INTERSPEECH 2024

Automatic pitch accent classification through image classification

Abstract

The classification of pitch accents has posed significant challenges in automatic intonation labelling. Previous research primarily adopted feature-based approaches, predicting pitch accents using a finite set of features including acoustic features (F0, duration, intensity) and lexical features. In this study, we explored a novel approach, classifying pitch accents as images represented in pixels. To evaluate this method’s effectiveness, we used a relatively simple classification task involving only two types of pitch accents (H* and L+H*). The training of a basic neural network model for classifying images of these two types of accents (N= 2,025) yielded an average accuracy of 93.5% across 10 runs on the test set, showcasing the potential effectiveness of this new approach.

🌉 Interdisciplinary Bridge - Computer Vision and Machine Learning
🧭 Keyword Pioneer - intonation labeling
🐝 Cross-Pollinator - Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio