Papers
Active Few-Shot Learning for Sound Event Detection
Yu Wang, Mark Cartwright, Juan Pablo Bello
Adaptive multilingual speech recognition with pretrained models
Ngoc-Quan Pham, Alexander Waibel, Jan Niehues
Adaptive Rectangle Loss for Speaker Verification
Li Ruida, Fang Shuo, Ma Chenguang et al.
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios
Yihan Wu, Xu Tan, Bohan Li et al.
AdaVocoder: Adaptive Vocoder for Custom Voice
Xin Yuan, Robin Feng, Mingming Ye et al.
A deep complex multi-frame filtering network for stereophonic acoustic echo cancellation
Linjuan Cheng, Chengshi Zheng, Andong Li et al.
A Deep Learning Platform for Language Education Research and Development
Kye Min Tan, Richeng Duan, Xin Huang et al.
A Deep One-Class Learning Method for Replay Attack Detection
Yijie Lou, Shiliang Pu, Jianfeng Zhou et al.
ADFF: Attention Based Deep Feature Fusion Approach for Music Emotion Recognition
Zi Huang, Shulei Ji, Zhilan Hu et al.
Advanced Speaker Embedding with Predictive Variance of Gaussian Distribution for Speaker Adaptation in TTS
Jaeuk Lee, Joon-Hyuk Chang
Adversarial and Sequential Training for Cross-lingual Prosody Transfer TTS
Min-Kyung Kim, Joon-Hyuk Chang
Adversarial-Free Speaker Identity-Invariant Representation Learning for Automatic Dysarthric Speech Classification
Parvaneh Janbakhshi, Ina Kodrasi
Adversarial Knowledge Distillation For Robust Spoken Language Understanding
Ye Wang, Baishun Ling, Yanmeng Wang et al.
Adversarial Multi-Task Deep Learning for Noise-Robust Voice Activity Detection with Low Algorithmic Delay
Claus Larsen, Peter Koch, Zheng-Hua Tan
Adversarial Multi-Task Learning for Disentangling Timbre and Pitch in Singing Voice Synthesis
Tae-Woo Kim, Min-Su Kang, Gyeong-Hoon Lee
Adversarial Reweighting for Speaker Verification Fairness
Minho Jin, Chelsea Ju, Zeya Chen et al.
AdvEst: Adversarial Perturbation Estimation to Classify and Detect Adversarial Attacks against Speaker Identification
Sonal Joshi, Saurabh Kataria, Jesús Villalba et al.
A Graph Isomorphism Network with Weighted Multiple Aggregators for Speech Emotion Recognition
Ying Hu, Yuwu Tang, Hao Huang et al.
A Hierarchical Speaker Representation Framework for One-shot Singing Voice Conversion
Xu Li, Shansong Liu, Ying Shan
A High-Quality and Large-Scale Dataset for English-Vietnamese Speech Translation
Linh The Nguyen, Nguyen Luong Tran, Long Doan et al.
A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker Extraction
Zexu Pan, Meng Ge, Haizhou Li
Air tissue boundary segmentation using regional loss in real-time Magnetic Resonance Imaging video for speech production
Anwesha Roy, Varun Belagali, Prasanta Ghosh
A Language Agnostic Multilingual Streaming On-Device ASR System
Bo Li, Tara Sainath, Ruoming Pang et al.
A Laryngographic Study on the Voice Quality of Northern Vietnamese Tones under the Lombard Effect
Giang Le, Chilin Shih, Yan Tang