Papers
8,761 papers found
Domain Adaptive Self-supervised Training of Automatic Speech Recognition
Cong-Thanh Do, Rama Doddipatla, Mohan Li et al.
Don’t Stop Self-Supervision: Accent Adaptation of Speech Representations via Residual Adapters
Anshu Bhatia, Sanchit Sinha, Saket Dingliwal et al.
Do Phonatory Features Display Robustness to Characterize Parkinsonian Speech Across Corpora?
Anna Favaro, Tianyu Cao, Thomas Thebaud et al.
DoubleDeceiver: Deceiving the Speaker Verification System Protected by Spoofing Countermeasures
Mengao Zhang, Ke Xu, Hao Li et al.
Do Vocal Breath Sounds Encode Gender Cues for Automatic Gender Classification?
Mohammad Shaique Solanki, Ashutosh Bharadwaj, Jeevan Kylash et al.
Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss
Hiroshi Sato, Ryo Masumura, Tsubasa Ochiai et al.
DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models
Yifan Peng, Yui Sudo, Shakeel Muhammad et al.
DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech
Sen Liu, Yiwei Guo, Chenpeng Du et al.
Dual Acoustic Linguistic Self-supervised Representation Learning for Cross-Domain Speech Recognition
Zhao Yang, Dianwen Ng, Chong Zhang et al.
Dual Audio Encoders Based Mandarin Prosodic Boundary Prediction by Using Multi-Granularity Prosodic Representations
Ruishan Li, Yingming Gao, Yanlu Xie et al.
Dual Memory Fusion for Multimodal Speech Emotion Recognition
Darshana Prisayad, Tharindu Fernando, Sridha Sridharan et al.
Dual-Memory Multi-Modal Learning for Continual Spoken Keyword Spotting with Confidence Selection and Diversity Enhancement
Zhao Yang, Dianwen Ng, Xizhe Li et al.
Dual-Mode NAM: Effective Top-K Context Injection for End-to-End ASR
Zelin Wu, Tsendsuren Munkhdalai, Pat Rondon et al.
Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition
Yuchen Hu, Nana Hou, Chen Chen et al.
Dual Transformer Decoder based Features Fusion Network for Automated Audio Captioning
Jianyuan Sun, Xubo Liu, Xinhao Mei et al.
DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding
Ziqian Ning, Yuepeng Jiang, Pengcheng Zhu et al.
DuTa-VC: A Duration-aware Typical-to-atypical Voice Conversion Approach with Diffusion Probabilistic Model
Helin Wang, Thomas Thebaud, Jesús Villalba et al.
Dynamic Encoder RNN for Online Voice Activity Detection in Adverse Noise Conditions
Prithvi R.R. Gudepu, Jayesh M. Koroth, Kamini Sabu et al.
Dynamic Fully-Connected Layer for Large-Scale Speaker Verification
Zhida Song, Liang He, Baowei Zhao et al.
Dysarthric Speech Recognition, Detection and Classification using Raw Phase and Magnitude Spectra
Zhengjun Yue, Erfan Loweimi, Zoran Cvetkovic
E2E-S2S-VC: End-To-End Sequence-To-Sequence Voice Conversion
Takuma Okamoto, Tomoki Toda, Hisashi Kawai
ECAPA++: Fine-grained Deep Embedding Learning for TDNN Based Speaker Verification
Bei Liu, Yanmin Qian
eCat: An End-to-End Model for Multi-Speaker TTS & Many-to-Many Fine-Grained Prosody Transfer
Ammar Abbas, Sri Karlapati, Bastian Schnell et al.
EdenTTS: A Simple and Efficient Parallel Text-to-speech Architecture with Collaborative Duration-alignment Learning
Youneng Ma, Junyi He, Meimei Wu et al.