Papers
8,761 papers found
DiffVC+: Improving Diffusion-based Voice Conversion for Speaker Anonymization
Fan Huang, Kun Zeng, Wei Zhu
DINO-VITS: Data-Efficient Zero-Shot TTS with Self-Supervised Speaker Verification Loss for Noise Robustness
Vikentii Pankov, Valeria Pronina, Alexander Kuzmin et al.
Direct Speech Synthesis from Non-Invasive, Neuromagnetic Signals
Jinuk Kwon, David Harwath, Debadatta Dash et al.
Dirichlet process mixture model based on topologically augmented signal representation for clustering infant vocalizations
Guillem Bonafos, Clara Bourot, Pierre Pudlo et al.
DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding
Suwon Shon, Kwangyoun Kim, Yi-Te Hsu et al.
Disentangled Representation Learning for Environment-agnostic Speaker Recognition
KiHyun Nam, Hee-Soo Heo, Jee-weon Jung et al.
Disentangling Age and Identity with a Mutual Information Minimization for Cross-Age Speaker Verification
Fengrun Zhang, Wangjin Zhou, Yiming Liu et al.
Disentangling prosody and timbre embeddings via voice conversion
Nicolas Gengembre, Olivier Le Blouch, Cédric Gendrot
Diversifying and Expanding Frequency-Adaptive Convolution Kernels for Sound Event Detection
Hyeonuk Nam, Seong-Hu Kim, Deokki Min et al.
DiveSound: LLM-Assisted Automatic Taxonomy Construction for Diverse Audio Generation
Baihan Li, Zeyu Xie, Xuenan Xu et al.
DNSMOS Pro: A Reduced-Size DNN for Probabilistic MOS of Speech
Fredrik Cumlin, Xinyu Liang, Victor Ungureanu et al.
Does the Lombard Effect Matter in Speech Separation? Introducing the Lombard-GRID-2mix Dataset
Iva Ewert, Marvin Borsdorf, Haizhou Li et al.
Domain Adaptation for Contrastive Audio-Language Models
Soham Deshmukh, Rita Singh, Bhiksha Raj
Domain-Aware Data Selection for Speech Classification via Meta-Reweighting
Junghun Kim, Ka Hyun Park, Hoyoung Yoon et al.
Do Speaker-dependent Vowel Characteristics depend on Speech Style?
Nicolas Audibert, Cecile Fougeron, Christine Meunier
Do we EXPECT TO find phonetic traces for syntactic traces?
Jonathan Him Nok Lee, Mark Liberman, Martin Salzmann
DreamVoice: Text-Guided Voice Conversion
Jiarui Hai, Karan Thakkar, Helin Wang et al.
DropFormer: A Dynamic Noise-Dropping Transformer for Speech Emotion Recognition
Jialong Mai, Xiaofen Xing, Weidong Chen et al.
Dual-Constrained Dynamical Neural ODEs for Ambiguity-aware Continuous Emotion Prediction
Jingyao Wu, Ting Dang, Vidhyasaharan Sethu et al.
Dual-path Adaptation of Pretrained Feature Extraction Module for Robust Automatic Speech Recognition
Hao Shi, Tatsuya Kawahara
Dual-Pipeline with Low-Rank Adaptation for New Language Integration in Multilingual ASR
Yerbolat Khassanov, Zhipeng Chen, Tianfeng Chen et al.
DualPure: An Efficient Adversarial Purification Method for Speech Command Recognition
Hao Tan, Xiaochen Liu, Huan Zhang et al.
DualSpeech: Enhancing Speaker-Fidelity and Text-Intelligibility Through Dual Classifier-Free Guidance
Jinhyeok Yang, Junhyeok Lee, Hyeong-Seok Choi et al.
DualVC 3: Leveraging Language Model Generated Pseudo Context for End-to-end Low Latency Streaming Voice Conversion
Ziqian Ning, Shuai Wang, Pengcheng Zhu et al.