Papers
Speaker- and Phone-aware Convolutional Transformer Network for Acoustic Echo Cancellation
Chang Han, Weiping Tu, Yuhong Yang et al.
Speaker Anonymization with Phonetic Intermediate Representations
Sarina Meyer, Florian Lux, Pavel Denisov et al.
Speaker-Aware Mixture of Mixtures Training for Weakly Supervised Speaker Extraction
Zifeng Zhao, Rongzhi Gu, Dongchao Yang et al.
Speaker conditioned acoustic modeling for multi-speaker conversational ASR
Srikanth Raj Chetupalli, Sriram Ganapathy
Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data
Naoki Makishima, Satoshi Suzuki, Atsushi Ando et al.
Speaker recognition-assisted robust audio deepfake detection
Jiahui Pan, Shuai Nie, Hui Zhang et al.
Speaker-Specific Utterance Ensemble based Transfer Attack on Speaker Identification
Chu-Xiao Zuo, Jia-Yi Leng, Wu-Jun Li
Speaker Trait Enhancement for Cochlear Implant Users: A Case Study for Speaker Emotion Perception
Avamarie Brueggeman, John H.L. Hansen
Speaking Rate Control of end-to-end TTS Models by Direct Manipulation of the Encoder's Output Embeddings
Martin Lenglet, Olivier Perrotin, Gérard Bailly
Speak Like a Professional: Increasing Speech Intelligibility by Mimicking Professional Announcer Voice with Voice Conversion
Tuan Vu Ho, Maori Kobayashi, Masato Akagi
SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping
Yuma Koizumi, Heiga Zen, Kohei Yatabe et al.
Spectral Modification Based Data Augmentation For Improving End-to-End ASR For Children’s Speech
Vishwanath Pratap Singh, Hardik Sailor, Supratik Bhattacharya et al.
Spectro-Temporal SubNet for Real-Time Monaural Speech Denoising and Dereverberation
Feifei Xiong, Weiguang Chen, Pengyu Wang et al.
Speech2Slot: A Limited Generation Framework with Boundary Detection for Slot Filling from Speech
Pengwei Wang, Yinpei Su, Xiaohuan Zhou et al.
Speech Acoustics in Mild Cognitive Impairment and Parkinson's Disease With and Without Concurrent Drawing Tasks
Tanya Talkar, Christina Manxhari, James Williamson et al.
Speech and the n-Back task as a lens into depression. How combining both may allow us to isolate different core symptoms of depression
Salvatore Fara, Stefano Goria, Emilia Molimpakis et al.
Speech Audio Corrector: using speech from non-target speakers for one-off correction of mispronunciations in grapheme-input text-to-speech
Jason Fong, Daniel Lyth, Gustav Eje Henter et al.
Speech Emotion: Investigating Model Representations, Multi-Task Learning and Knowledge Distillation
Vikramjit Mitra, Hsiang-Yun Sherry Chien, Vasudha Kowtha et al.
Speech Emotion Recognition in the Wild using Multi-task and Adversarial Learning
Jack Parry, Eric DeMattos, Anita Klementiev et al.
Speech Emotion Recognition via Generation using an Attention-based Variational Recurrent Neural Network
Murchana Baruah, Bonny Banerjee
Speech Enhancement with Fullband-Subband Cross-Attention Network
Jun Chen, Wei Rao, Zilin Wang et al.
Speech Enhancement with Score-Based Generative Models in the Complex STFT Domain
Simon Welker, Julius Richter, Timo Gerkmann
SpeechEQ: Speech Emotion Recognition based on Multi-scale Unified Datasets and Multitask Learning
Zuheng Kang, Junqing Peng, Jianzong Wang et al.
SpeechFormer: A Hierarchical Efficient Framework Incorporating the Characteristics of Speech
Weidong Chen, Xiaofen Xing, Xiangmin Xu et al.
Speech imitation skills predict automatic phonetic convergence: a GMM-UBM study on L2
Dorina de Jong, Aldo Pastore, Noël Nguyen et al.