Papers
Far-Field Speaker Localization and Adaptive GLMB Tracking
Shoufeng Lin, Zhaojie Luo
FastICARL: Fast Incremental Classifier and Representation Learning with Efficient Budget Allocation in Audio Sensing Applications
Young D. Kwon, Jagmohan Chauhan, Cecilia Mascolo
FastPitchFormant: Source-Filter Based Decomposed Modeling for Speech Synthesis
Taejun Bak, Jae-Sung Bae, Hanbin Bae et al.
Fast Text-Only Domain Adaptation of RNN-Transducer Prediction Network
Janne Pylkkönen, Antti Ukkonen, Juho Kilpikoski et al.
Fearless Steps Challenge Phase-3 (FSC P3): Advancing SLT for Unseen Channel and Mission Data Across NASA Apollo Audio
Aditya Joglekar, Seyed Omid Sadjadi, Meena Chandra-Shekar et al.
Feature Fusion by Attention Networks for Robust DOA Estimation
Rongliang Liu, Nengheng Zheng, Xi Chen
Federated Learning with Dynamic Transformer for Text to Speech
Zhenhou Hong, Jianzong Wang, Xiaoyang Qu et al.
Few-Shot Keyword Spotting in Any Language
Mark Mazumder, Colby Banbury, Josh Meyer et al.
Few-Shot Learning of New Sound Classes for Target Sound Extraction
Marc Delcroix, Jorge Bennasar Vázquez, Tsubasa Ochiai et al.
Fine-Grained Prosody Modeling in Neural Speech Synthesis Using ToBI Representation
Yuxiang Zou, Shichao Liu, Xiang Yin et al.
Fine-Tuning Pre-Trained Voice Conversion Model for Adding New Target Speakers with Limited Data
Takeshi Koshizuka, Hidefumi Ohmura, Kouichi Katsurada
Flexi-Transducer: Optimizing Latency, Accuracy and Compute for Multi-Domain On-Device Scenarios
Jay Mahadeokar, Yangyang Shi, Yuan Shangguan et al.
Fre-GAN: Adversarial Frequency-Consistent Audio Synthesis
Ji-Hoon Kim, Sang-Hoon Lee, Ji-Hyun Lee et al.
Fricative Phoneme Detection Using Deep Neural Networks and its Comparison to Traditional Methods
Metehan Yurt, Pavan Kantharaju, Sascha Disch et al.
FRILL: A Non-Semantic Speech Embedding for Mobile Devices
Jacob Peplinski, Joel Shor, Sachin Joglekar et al.
FSR: Accelerating the Inference Process of Transducer-Based Models by Applying Fast-Skip Regularization
Zhengkun Tian, Jiangyan Yi, Ye Bai et al.
F-T-LSTM Based Complex Network for Joint Acoustic Echo Cancellation and Speech Enhancement
Shimin Zhang, Yuxiang Kong, Shubo Lv et al.
Funnel Deep Complex U-Net for Phase-Aware Speech Enhancement
Yuhang Sun, Linju Yang, Huifeng Zhu et al.
Fusion-Net: Time-Frequency Information Fusion Y-Network for Speech Enhancement
Santhan Kumar Reddy Nareddula, Subrahmanyam Gorthi, Rama Krishna Sai S. Gorthi
Fusion of Embeddings Networks for Robust Combination of Text Dependent and Independent Speaker Recognition
Ruirui Li, Chelsea J.-T. Ju, Zeya Chen et al.
GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis
Jinhyeok Yang, Jae-Sung Bae, Taejun Bak et al.
GAN Vocoder: Multi-Resolution Discriminator Is All You Need
Jaeseong You, Dalhyun Kim, Gyuhyeon Nam et al.
Generalized Dilated CNN Models for Depression Detection Using Inverted Vocal Tract Variables
Nadee Seneviratne, Carol Espy-Wilson