Papers
8,761 papers found
DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing
Neha Sahipjohn, Ashishkumar Gudmalwar, Nirmesh Shah et al.
Dynamic Data Pruning for Automatic Speech Recognition
Qiao Xiao, Pingchuan Ma, Adriana Fernandez-Lopez et al.
Dynamic Encoder Size Based on Data-Driven Layer-wise Pruning for Speech Recognition
Jingjing Xu, Wei Zhou, Zijian Yang et al.
Dynamic Gated Recurrent Neural Network for Compute-efficient Speech Enhancement
Longbiao Cheng, Ashutosh Pandey, Buye Xu et al.
DysArinVox: DYSphonia & DYSarthria mandARIN speech corpus
Haojie Zhang, Tao Zhang, Ganjun Liu et al.
Dysarthric Speech Recognition Using Curriculum Learning and Articulatory Feature Embedding
I-Ting Hsieh, Chung-Hsien Wu
EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation
Julius Richter, Yi-Chiao Wu, Steven Krenn et al.
Echoes of Implicit Bias Exploring Aesthetics and Social Meanings of Swiss German Dialect Features
Tillmann Pistor, Adrian Leemann
Edged based audio-visual speech enhancement demonstrator
Song Chen, Mandar Gogate, Kia Dashtipour et al.
ED-sKWS: Early-Decision Spiking Neural Networks for Rapid, and Energy-Efficient Keyword Spotting
Zeyang Song, Qianhui Liu, Qu Yang et al.
EEND-M2F: Masked-attention mask transformers for speaker diarization
Marc Härkönen, Samuel J. Broughton, Lahiru Samarakoon
Effect of Complex Boundary Tones on Tone Identification: An Experimental Study with Mandarin-speaking Preschool Children
Aijun Li, Jun Gao, Zhiwei Wang
Efficient and Robust Long-Form Speech Recognition with Hybrid H3-Conformer
Tomoki Honda, Shinsuke Sakai, Tatsuya Kawahara
Efficient Audio Captioning with Encoder-Level Knowledge Distillation
Xuenan Xu, Haohe Liu, Mengyue Wu et al.
Efficient CNNs with Quaternion Transformations and Pruning for Audio Tagging
Aryan Chaudhary, Arshdeep Singh, Vinayak Abrol et al.
Efficient Fine-tuning of Audio Spectrogram Transformers via Soft Mixture of Adapters
Umberto Cappellazzo, Daniele Falavigna, Alessio Brutti
Efficient Integrated Features Based on Pre-trained Models for Speaker Verification
Yishuang Li, Wenhao Guan, Hukai Huang et al.
Efficient Joint Bemforming and Acoustic Echo Cancellation Structure for Conference Call Scenarios
Ofer Schwartz, Sharon Gannot
Efficiently Train ASR Models that Memorize Less and Perform Better with Per-core Clipping
Lun Wang, Om Thakkar, Zhong Meng et al.
Efficient Speaker Embedding Extraction Using a Twofold Sliding Window Algorithm for Speaker Diarization
Jeong-Hwan Choi, Ye-Rin Jeoung, Ilseok Kim et al.
Efficient SQA from Long Audio Contexts: A Policy-driven Approach
Alexander Johnson, Peter Plantinga, Pheobe Sun et al.
EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Low Resource and Multilingual Scenarios
Tejes Srivastava, Jiatong Shi, William Chen et al.
ElasticAST: An Audio Spectrogram Transformer for All Length and Resolutions
Jiu Feng, Mehmet Hamza Erol, Joon Son Chung et al.
Electroglottography for the assessment of dysphonia in Parkinson's disease and multiple system atrophy
Khalid Daoudi, Solange Milhé de Saint Victor, Alexandra Foubert-Samier et al.