Papers
GigaSpeech: An Evolving, Multi-Domain ASR Corpus with 10,000 Hours of Transcribed Audio
Guoguo Chen, Shuzhou Chai, Guan-Bo Wang et al.
GlobalPhone Mix-To-Separate Out of 2: A Multilingual 2000 Speakers Mixtures Database for Speech Separation
Marvin Borsdorf, Chenglin Xu, Haizhou Li et al.
Glottal Sounds in Korebaju
Jenifer Vega Rodriguez, Nathalie Vallée
Glottal Stops in Upper Sorbian: A Data-Driven Approach
Ivan Kraljevski, Maria Paola Bissiri, Frank Duckhorn et al.
Glow-WaveGAN: Learning Speech Representations from GAN-Based Variational Auto-Encoder for High Fidelity Flow-Based Speech Synthesis
Jian Cong, Shan Yang, Lei Xie et al.
Golos: Russian Dataset for Speech Research
Nikolay Karpov, Alexander Denisenko, Fedor Minkin
Gradient Regularization for Noise-Robust Speaker Verification
Jianchen Li, Jiqing Han, Hongwei Song
Graph Attention Networks for Anti-Spoofing
Hemlata Tak, Jee-weon Jung, Jose Patino et al.
Graph-Based Label Propagation for Semi-Supervised Speaker Identification
Long Chen, Venkatesh Ravichandran, Andreas Stolcke
Graph Isomorphism Network for Speech Emotion Recognition
Jiawang Liu, Haoxiang Wang
Graph-PIT: Generalized Permutation Invariant Training for Continuous Separation of Arbitrary Numbers of Speakers
Thilo von Neumann, Keisuke Kinoshita, Christoph Boeddeker et al.
Group Delay Based Re-Weighted Sparse Recovery Algorithms for Robust and High-Resolution Source Separation in DOA Framework
Murtiza Ali, Ashwani Koul, Karan Nathwani
Half-Truth: A Partially Fake Audio Detection Dataset
Jiangyan Yi, Ye Bai, Jianhua Tao et al.
Handling Acoustic Variation in Dysarthric Speech Recognition Systems Through Model Combination
Enno Hermann, Mathew Magimai-Doss
Harmonic WaveGAN: GAN-Based Speech Waveform Generation Model with Harmonic Structure Discriminator
Kazuki Mizuta, Tomoki Koriyama, Hiroshi Saruwatari
Hierarchical Context-Aware Transformers for Non-Autoregressive Text to Speech
Jae-Sung Bae, Taejun Bak, Young-Sun Joo et al.
Hierarchical Phone Recognition with Compositional Phonetics
Xinjian Li, Juncheng Li, Florian Metze et al.
Hi-Fi Multi-Speaker English TTS Dataset
Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg et al.
High-Fidelity Parallel WaveGAN with Multi-Band Harmonic-Plus-Noise Model
Min-Jae Hwang, Ryuichi Yamamoto, Eunwoo Song et al.
HMM-Free Encoder Pre-Training for Streaming RNN Transducer
Lu Huang, Jingyu Sun, Yufeng Tang et al.
How f0 and Phrase Position Affect Papuan Malay Word Identification
Constantijn Kaland, Matthew Gordon
How Reliable Are Phonetic Data Collected Remotely? Comparison of Recording Devices and Environments on Acoustic Measurements
Chunyu Ge, Yixuan Xiong, Peggy Mok
Human-in-the-Loop Efficiency Analysis for Binary Classification in Edyson
Per Fallgren, Jens Edlund
Human Listening and Live Captioning: Multi-Task Training for Speech Enhancement
Sefik Emre Eskimez, Xiaofei Wang, Min Tang et al.