Speech foundation models in healthcare: Effect of layer selection on pathological speech feature prediction

Daniela A. Wiepert; Rene L. Utianski; Joseph R. Duffy; John L. Stricker; Leland R. Barnard; David T. Jones; Hugo Botha

2024 INTERSPEECH INTERSPEECH 2024

Speech foundation models in healthcare: Effect of layer selection on pathological speech feature prediction

Abstract

Accurately extracting clinical information from speech is critical to the diagnosis and treatment of many neurological conditions. As such, there is interest in leveraging AI for automatic, objective assessments of clinical speech to facilitate diagnosis and treatment of speech disorders. We explore transfer learning using foundation models, focusing on the impact of layer selection for the downstream task of predicting pathological speech features. We find that selecting an optimal layer can greatly improve performance (15.8% increase in balanced accuracy per feature as compared to worst layer, 13.6% increase as compared to final layer), though the best layer varies by predicted feature and does not always generalize well to unseen data. A learned weighted sum offers comparable performance to the average best layer in-distribution (only 1.2% lower) and had strong generalization for out-of-distribution data (only 1.5% lower than the average best layer).

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision

🧭 Keyword Pioneer — layer selection

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Daniela A. Wiepert , Rene L. Utianski , Joseph R. Duffy , John L. Stricker , Leland R. Barnard , David T. Jones , Hugo Botha

Topics

Artificial Intelligence > Core AI > Foundation Models Artificial Intelligence > Learning Paradigms > Transfer Learning Computer Vision > Domain-Specific > Medical Imaging

Keywords

transfer learning foundation model layer selection pathological speech clinical speech

Download PDF

Related papers

Reshape Dimensions Network for Speaker Recognition 2024

RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification 2024

Mixed Children/Adult/Childrenized Fine-Tuning for Children’s ASR: How to Reduce Age Mismatch and Speaking Style Mismatch 2024

Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions 2024

K-means and hierarchical clustering of f0 contours 2024