Papers
Deep Speech Synthesis from Articulatory Representations
Peter Wu, Shinji Watanabe, Louis Goldstein et al.
Deep Transductive Transfer Regression Network for Cross-Corpus Speech Emotion Recognition
Yan Zhao, Jincen Wang, Ru Ye et al.
Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech Models
Takanori Ashihara, Takafumi Moriya, Kohei Matsuura et al.
Defense against Adversarial Attacks on Hybrid Speech Recognition System using Adversarial Fine-tuning with Denoiser
Sonal Joshi, Saurabh Kataria, Yiwen Shao et al.
Deformable CNN and Imbalance-Aware Feature Learning for Singing Technique Classification
Yuya Yamamoto, Juhan Nam, Hiroko Terasawa
DEFORMER: Coupling Deformed Localized Patterns with Global Context for Robust End-to-end Speech Recognition
Jiamin Xie, John H.L. Hansen
DeID-VC: Speaker De-identification via Zero-shot Pseudo Voice Conversion
Ruibin Yuan, Yuxuan Wu, Jacob Li et al.
Deliberation Model for On-Device Spoken Language Understanding
Duc Le, Akshat Shrivastava, Paden D. Tomasello et al.
DelightfulTTS 2: End-to-End Speech Synthesis with Adversarial Vector-Quantized Auto-Encoders
Yanqing Liu, Ruiqing Xue, Lei He et al.
Densely-connected Convolutional Recurrent Network for Fundamental Frequency Estimation in Noisy Speech
Yixuan Zhang, Heming Wang, DeLiang Wang
Design Guidelines for Inclusive Speaker Verification Evaluation Datasets
Wiebke Toussaint, Lauriane Gorce, Aaron Yi Ding
Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0
Sebastian Peter Bayerl, Dominik Wagner, Elmar Noeth et al.
Detecting Heart Failure Through Voice Analysis using Self-Supervised Mode-Based Memory Fusion
Darshana Priyasad, Andi Partovi, Sridha Sridharan et al.
Detecting Unintended Memorization in Language-Model-Fused ASR
W. Ronny Huang, Steve Chien, Om Dipakbhai Thakkar et al.
Detection of Learners' Listening Breakdown with Oral Dictation and Its Use to Model Listening Skill Improvement Exclusively Through Shadowing
Takuya Kunihara, Chuanbo Zhu, Daisuke Saito et al.
DeToxy: A Large-Scale Multimodal Dataset for Toxicity Classification in Spoken Utterances
Sreyan Ghosh, Samden Lepcha, S Sakshi et al.
Development of allophonic realization until adolescence: A production study of the affricate-fricative variation of /z/ among Japanese children
Sanae Matsui, Kyoji Iwamoto, Reiko Mazuka
Device-Directed Speech Detection: Regularization via Distillation for Weakly-Supervised Models
Vineet Garg, Ognjen Rudovic, Pranay Dighe et al.
DF-ResNet: Boosting Speaker Verification Performance with Depth-First Design
Bei Liu, Zhengyang Chen, Shuai Wang et al.
Dialogue Acts Aided Important Utterance Detection Based on Multiparty and Multimodal Information
Fumio Nihei, Ryo Ishii, Yukiko Nakano et al.
Differential Time-frequency Log-mel Spectrogram Features for Vision Transformer Based Infant Cry Recognition
Hai-tao Xu, Jie Zhang, Li-rong Dai
Diffusion Generative Vocoder for Fullband Speech Synthesis Based on Weak Third-order SDE Solver
Hideyuki Tachibana, Muneyoshi Inahara, Mocho Go et al.
Directed speech separation for automatic speech recognition of long form conversational speech
Rohit Paturi, Sundararajan Srinivasan, Katrin Kirchhoff et al.
Direction-Aware Joint Adaptation of Neural Speech Enhancement and Recognition in Real Multiparty Conversational Environments
Yicheng Du, Aditya Arie Nugraha, Kouhei Sekiguchi et al.