Papers
Assessing the Use of Prosody in Constituency Parsing of Imperfect Transcripts
Trang Tran, Mari Ostendorf
Assessment of von Mises-Bernoulli Deep Neural Network in Sound Source Localization
Katsutoshi Itoyama, Yoshiya Morimoto, Shungo Masaki et al.
AST: Audio Spectrogram Transformer
Yuan Gong, Yu-An Chung, James Glass
A Study into Pre-Training Strategies for Spoken Language Understanding on Dysarthric Speech
Pu Wang, Bagher BabaAli, Hugo Van hamme
A Study on Fine-Tuning wav2vec2.0 Model for the Task of Mispronunciation Detection and Diagnosis
Linkai Peng, Kaiqi Fu, Binghuai Lin et al.
A Systematic Review and Analysis of Multilingual Data Strategies in Text-to-Speech for Low-Resource Languages
Phat Do, Matt Coler, Jelske Dijkstra et al.
A Thousand Words are Worth More Than One Recording:Word-EmbeddingBased Speaker Change Detection
Or Haim Anidjar, Itshak Lapidot, Chen Hajaj et al.
Attention-Based Convolutional Neural Network for ASV Spoofing Detection
Hefei Ling, Leichao Huang, Junrui Huang et al.
Attention-Based Cross-Modal Fusion for Audio-Visual Voice Activity Detection in Musical Video Streams
Yuanbo Hou, Zhesong Yu, Xia Liang et al.
Attention-Based Keyword Localisation in Speech Using Visual Grounding
Kayode Olaleye, Herman Kamper
A Two-Stage Approach to Speech Bandwidth Extension
Ju Lin, Yun Wang, Kaustubh Kalgaonkar et al.
Audio Retrieval with Natural Language Queries
Andreea-Maria Oncescu, A. Sophia Koepke, João F. Henriques et al.
Audio Segmentation Based Conversational Silence Detection for Contact Center Calls
Krishnachaitanya Gogineni, Tarun Reddy Yadama, Jithendra Vepa
Audio-Visual Information Fusion Using Cross-Modal Teacher-Student Learning for Voice Activity Detection in Realistic Environments
Hengshun Zhou, Jun Du, Hang Chen et al.
Audio-Visual Multi-Talker Speech Recognition in a Cocktail Party
Yifei Wu, Chenda Li, Song Yang et al.
Audio-Visual Recognition of Emotional Engagement of People with Dementia
Lars Steinert, Felix Putze, Dennis Küster et al.
Audio-Visual Speech Emotion Recognition by Disentangling Emotion and Identity Attributes
Koichiro Ito, Takuya Fujioka, Qinghua Sun et al.
Audiovisual Transfer Learning for Audio Tagging and Sound Event Detection
Wim Boes, Hugo Van hamme
Augmenting Slot Values and Contexts for Spoken Language Understanding with Pretrained Models
Haitao Lin, Lu Xiang, Yu Zhou et al.
A Universal Multi-Speaker Multi-Style Text-to-Speech via Disentangled Representation Learning Based on Rényi Divergence Minimization
Dipjyoti Paul, Sankar Mukherjee, Yannis Pantazis et al.
AusKidTalk: An Auditory-Visual Corpus of 3- to 12-Year-Old Australian Children’s Speech
Beena Ahmed, Kirrie J. Ballard, Denis Burnham et al.
Auto-KWS 2021 Challenge: Task, Datasets, and Baselines
Jingsong Wang, Yuxuan He, Chunyu Zhao et al.
Automated Detection of Voice Disorder in the Saarbrücken Voice Database: Effects of Pathology Subset and Audio Materials
Mark Huckvale, Catinca Buciuleac
Automatically Detecting Errors and Disfluencies in Read Speech to Predict Cognitive Impairment in People with Parkinson’s Disease
Amrit Romana, John Bandon, Matthew Perez et al.
Automatic Analysis of the Emotional Content of Speech in Daylong Child-Centered Recordings from a Neonatal Intensive Care Unit
Einari Vaaras, Sari Ahlqvist-Björkroth, Konstantinos Drossos et al.