Papers
Formant Estimation and Tracking using Probabilistic Heat-Maps
Yosi Shrem, Felix Kreuk, Joseph Keshet
Frame-Level Stutter Detection
John Harvill, Mark Hasegawa-Johnson, Chang D. Yoo
Frequency Dynamic Convolution: Frequency-Adaptive Pattern Recognition for Sound Event Detection
Hyeonuk Nam, Seong-Hu Kim, Byeong-Yun Ko et al.
From Disfluency Detection to Intent Detection and Slot Filling
Mai Hoang Dao, Thinh Truong, Dat Quoc Nguyen
From Simulated Mixtures to Simulated Conversations as Training Data for End-to-End Neural Diarization
Federico Landini, Alicia Lozano-Diez, Mireia Diez et al.
From Start to Finish: Latency Reduction Strategies for Incremental Speech Synthesis in Simultaneous Speech-to-Speech Translation
Danni Liu, Changhan Wang, Hongyu Gong et al.
From Undercomplete to Sparse Overcomplete Autoencoders to Improve LF-MMI based Speech Recognition
Selen Hande Kabil, Herve Bourlard
Fully Automatic Balance between Directivity Factor and White Noise Gain for Large-scale Microphone Arrays in Diffuse Noise Fields
Weixin Meng, Chengshi Zheng, Xiaodong Li
Fundamental Frequency Variability over Time in Telephone Interactions
Leah Bradshaw, Eleanor Chodroff, Lena Jäger et al.
Fusion of Self-supervised Learned Models for MOS Prediction
Zhengdong Yang, Wangjin Zhou, Chenhui Chu et al.
g2pW: A Conditional Weighted Softmax BERT for Polyphone Disambiguation in Mandarin
Yi-Chang Chen, Yu-Chuan Steven, Yen-Cheng Chang et al.
Gated Convolutional Fusion for Time-Domain Target Speaker Extraction Network
Wenjing Liu, Chuan Xie
Generalized Keyword Spotting using ASR embeddings
Kirandevraj R, Vinod Kumar Kurmi, Vinay Namboodiri et al.
Generalizing RNN-Transducer to Out-Domain Audio via Sparse Self-Attention Layers
Juntae Kim, Jeehye Lee
Generating gender-ambiguous voices for privacy-preserving speech recognition
Dimitrios Stoidis, Andrea Cavallaro
Generating iso-accented stimuli for second language research: methodology and a dataset for Spanish-accented English
Rubén Pérez Ramón, Martin Cooke, Maria Luisa Garcia Lecumberri
Generative Data Augmentation Guided by Triplet Loss for Speech Emotion Recognition
Shijun Wang, Hamed Hemati, Jón Guðnason et al.
GLD-Net: Improving Monaural Speech Enhancement by Learning Global and Local Dependency Features with GLD Block
Xinmeng Xu, Yang Wang, Jie Jia et al.
Global RNN Transducer Models For Multi-dialect Speech Recognition
Takashi Fukuda, Samuel Thomas, Masayuki Suzuki et al.
Global Signal-to-noise Ratio Estimation Based on Multi-subband Processing Using Convolutional Neural Network
Nan LI, Meng Ge, Longbiao Wang et al.
Glottal inverse filtering based on articulatory synthesis and deep learning
Ingo Langheinrich, Simon Stone, Xinyu Zhang et al.
GlowVC: Mel-spectrogram space disentangling model for language-independent text-free voice conversion
Magdalena Proszewska, Grzegorz Beringer, Daniel Sáez-Trigueros et al.
Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion
Yi Lei, Shan Yang, Jian Cong et al.
Gradual Improvements Observed in Learners' Perception and Production of L2 Sounds Through Continuing Shadowing Practices on a Daily Basis
Takuya Kunihara, Chuanbo Zhu, Nobuaki Minematsu et al.
Gram Vaani ASR Challenge on spontaneous telephone speech recordings in regional variations of Hindi
Anish Bhanushali, Grant Bridgman, Deekshitha G et al.