Papers
Voice Privacy Through x-Vector and CycleGAN-Based Anonymization
Gauri P. Prajapati, Dipesh K. Singh, Preet P. Amin et al.
Voicing Contrasts in the Singleton Stops of Palestinian Arabic: Production and Perception
Nour Tamim, Silke Hamann
Voting for the Right Answer: Adversarial Defense for Speaker Verification
Haibin Wu, Yang Zhang, Zhiyong Wu et al.
VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-Shot Voice Conversion
Disong Wang, Liqun Deng, Yu Ting Yeung et al.
wav2vec-C: A Self-Supervised Model for Speech Representation Learning
Samik Sadhu, Di He, Che-Wei Huang et al.
WavBERT: Exploiting Semantic and Non-Semantic Speech Using Wav2vec and BERT for Dementia Detection
Youxiang Zhu, Abdelrahman Obyat, Xiaohui Liang et al.
WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis
Nanxin Chen, Yu Zhang, Heiga Zen et al.
Weakly Supervised Construction of ASR Systems from Massive Video Data
Mengli Cheng, Chengyu Wang, Jun Huang et al.
Weakly-Supervised Speech-to-Text Mapping with Visually Connected Non-Parallel Speech-Text Data Using Cyclic Partially-Aligned Transformer
Johanes Effendi, Sakriani Sakti, Satoshi Nakamura
Weakly-Supervised Word-Level Pronunciation Error Detection in Non-Native English Speech
Daniel Korzekwa, Jaime Lorenzo-Trueba, Thomas Drugman et al.
Web Interface for Estimating Articulatory Movements in Speech Production from Acoustics and Text
Sathvik Udupa, Anwesha Roy, Abhayjeet Singh et al.
WeNet: Production Oriented Streaming and Non-Streaming End-to-End Speech Recognition Toolkit
Zhuoyuan Yao, Di Wu, Xiong Wang et al.
Whisper Speech Enhancement Using Joint Variational Autoencoder for Improved Speech Recognition
Vikas Agrawal, Shashi Kumar, Shakti P. Rath
WittyKiddy: Multilingual Spoken Language Learning for Kids
Ke Shi, Kye Min Tan, Huayun Zhang et al.
Word Competition: An Entropy-Based Approach in the DIANA Model of Human Word Comprehension
Louis ten Bosch, Lou Boves
WSRGlow: A Glow-Based Waveform Generative Model for Audio Super-Resolution
Kexun Zhang, Yi Ren, Changliang Xu et al.
X-net: A Joint Scale Down and Scale Up Method for Voice Call
Liang Wen, Lizhong Wang, Xue Wen et al.
Y2-Net FCRN for Acoustic Echo and Noise Suppression
Ernst Seidel, Jan Franzen, Maximilian Strake et al.
“You don’t understand me!”: Comparing ASR Results for L1 and L2 Speakers of Swedish
Ronald Cumbal, Birger Moell, José Lopes et al.
Y-Vector: Multiscale Waveform Encoder for Speaker Embedding
Ge Zhu, Fei Jiang, Zhiyao Duan
Zero-Shot Cross-Lingual Phonetic Recognition with External Language Embedding
Heting Gao, Junrui Ni, Yang Zhang et al.
Zero-Shot Federated Learning with New Classes for Audio Classification
Gautham Krishna Gudur, Satheesh Kumar Perepu
Zero-Shot Joint Modeling of Multiple Spoken-Text-Style Conversion Tasks Using Switching Tokens
Mana Ihori, Naoki Makishima, Tomohiro Tanaka et al.