Papers
8,761 papers found
Whisper-PMFA: Partial Multi-Scale Feature Aggregation for Speaker Verification using Whisper Models
Yiyang Zhao, Shuai Wang, Guangzhi Sun et al.
Whister: Using Whisper’s representations for Stuttering detection
Vrushank Changawala, Frank Rudzicz
Who Finds This Voice Attractive? A Large-Scale Experiment Using In-the-Wild Data
Hitoshi Suda, Aya Watanabe, Shinnosuke Takamichi
Word-level Text Markup for Prosody Control in Speech Synthesis
Yuliya Korotkova, Ilya Kalinovskiy, Tatiana Vakhrusheva
wTIMIT2mix: A Cocktail Party Mixtures Database to Study Target Speaker Extraction for Normal and Whispered Speech
Marvin Borsdorf, Zexu Pan, Haizhou Li et al.
XANE: eXplainable Acoustic Neural Embeddings
Sri Harsha Dumpala, Dushyant Sharma, Chandramouli Shama Sastry et al.
X-E-Speech: Joint Training Framework of Non-Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion
Houjian Guo, Chaoran Liu, Carlos Toshinori Ishi et al.
X-Singer: Code-Mixed Singing Voice Synthesis via Cross-Lingual Learning
Ji-Sang Hwang, Hyeongrae Noh, Yoonseok Hong et al.
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model
Edresson Casanova, Kelly Davis, Eren Gölge et al.
YOLOPitch: A Time-Frequency Dual-Branch YOLO Model for Pitch Estimation
Xuefei Li, Hao Huang, Ying Hu et al.
YOLO-Stutter: End-to-end Region-Wise Speech Dysfluency Detection
Xuanru Zhou, Anshul Kashyap, Steve Li et al.
Zero-Shot End-To-End Spoken Question Answering In Medical Domain
Yanis Labrak, Adel Moumen, Richard Dufour et al.
Zero-Shot Fake Video Detection by Audio-Visual Consistency
Xiaolou Li, Zehua Liu, Chen Chen et al.
Zero-shot Out-of-domain is No Joke: Lessons Learned in the VoiceMOS 2023 MOS Prediction Challenge
Marie Kunešová, Jan Lehečka, Josef Michálek et al.
ZeroST: Zero-Shot Speech Translation
Sameer Khurana, Chiori Hori, Antoine Laurent et al.
2-bit Conformer quantization for automatic speech recognition
Oleg Rybakov, Phoenix Meadowlark, Shaojin Ding et al.
4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders
Yui Sudo, Shakeel Muhammad, Brian Yan et al.
5G-IoT Cloud based Demonstration of Real-Time Audio-Visual Speech Enhancement for Multimodal Hearing-aids
Ankit Gupta, Abhijeet Bishnu, Mandar Gogate et al.
5IDER: Unified Query Rewriting for Steering, Intent Carryover, Disfluencies, Entity Carryover and Repair
Jiarui Lu, Bo-Hsiang Tseng, Joel Ruben Antony Moniz et al.
ABC-KD: Attention-Based-Compression Knowledge Distillation for Deep Learning-Based Noise Suppression
Yixin Wan, Yuan Zhou, Xiulian Peng et al.
Aberystwyth English Pre-aspiration in Apparent Time
Míša Michaela Hejná, Adèle Jatteau
A Binary Keyword Spotting System with Error-Diffusion Based Feature Binarization
Dingyi Wang, Mengjie Luo, Lin Li et al.
Abusive Speech Detection in Indic Languages Using Acoustic Features
Anika A. Spiesberger, Andreas Triantafyllopoulos, Iosif Tsangko et al.
ACA-Net: Towards Lightweight Speaker Verification using Asymmetric Cross Attention
Jia Qi Yip, Duc-Tuan Truong, Dianwen Ng et al.
Accelerating Transducers through Adjacent Token Merging
Yuang Li, Yu Wu, Jinyu Li et al.