Research Explorer

DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing

Neha Sahipjohn, Ashishkumar Gudmalwar, Nirmesh Shah et al.

2024 INTERSPEECH

Dynamic Data Pruning for Automatic Speech Recognition

Qiao Xiao, Pingchuan Ma, Adriana Fernandez-Lopez et al.

2024 INTERSPEECH

Dynamic Encoder Size Based on Data-Driven Layer-wise Pruning for Speech Recognition

Jingjing Xu, Wei Zhou, Zijian Yang et al.

2024 INTERSPEECH

Dynamic Gated Recurrent Neural Network for Compute-efficient Speech Enhancement

Longbiao Cheng, Ashutosh Pandey, Buye Xu et al.

2024 INTERSPEECH

DysArinVox: DYSphonia & DYSarthria mandARIN speech corpus

Haojie Zhang, Tao Zhang, Ganjun Liu et al.

2024 INTERSPEECH

Dysarthric Speech Recognition Using Curriculum Learning and Articulatory Feature Embedding

I-Ting Hsieh, Chung-Hsien Wu

2024 INTERSPEECH

EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation

Julius Richter, Yi-Chiao Wu, Steven Krenn et al.

2024 INTERSPEECH

Echoes of Implicit Bias Exploring Aesthetics and Social Meanings of Swiss German Dialect Features

Tillmann Pistor, Adrian Leemann

2024 INTERSPEECH

Edged based audio-visual speech enhancement demonstrator

Song Chen, Mandar Gogate, Kia Dashtipour et al.

2024 INTERSPEECH

ED-sKWS: Early-Decision Spiking Neural Networks for Rapid, and Energy-Efficient Keyword Spotting

Zeyang Song, Qianhui Liu, Qu Yang et al.

2024 INTERSPEECH

EEND-M2F: Masked-attention mask transformers for speaker diarization

Marc Härkönen, Samuel J. Broughton, Lahiru Samarakoon

2024 INTERSPEECH

Effect of Complex Boundary Tones on Tone Identification: An Experimental Study with Mandarin-speaking Preschool Children

Aijun Li, Jun Gao, Zhiwei Wang

2024 INTERSPEECH

Effects of talker and playback rate of reverberation-induced speech on speech intelligibility of older adults

Nao Hodoshima

2024 INTERSPEECH

Efficient and Robust Long-Form Speech Recognition with Hybrid H3-Conformer

Tomoki Honda, Shinsuke Sakai, Tatsuya Kawahara

2024 INTERSPEECH

Efficient Audio Captioning with Encoder-Level Knowledge Distillation

Xuenan Xu, Haohe Liu, Mengyue Wu et al.

2024 INTERSPEECH

Efficient CNNs with Quaternion Transformations and Pruning for Audio Tagging

Aryan Chaudhary, Arshdeep Singh, Vinayak Abrol et al.

2024 INTERSPEECH

Efficient Fine-tuning of Audio Spectrogram Transformers via Soft Mixture of Adapters

Umberto Cappellazzo, Daniele Falavigna, Alessio Brutti

2024 INTERSPEECH

Efficient Integrated Features Based on Pre-trained Models for Speaker Verification

Yishuang Li, Wenhao Guan, Hukai Huang et al.

2024 INTERSPEECH

Efficient Joint Bemforming and Acoustic Echo Cancellation Structure for Conference Call Scenarios

Ofer Schwartz, Sharon Gannot

2024 INTERSPEECH

Efficiently Train ASR Models that Memorize Less and Perform Better with Per-core Clipping

Lun Wang, Om Thakkar, Zhong Meng et al.

2024 INTERSPEECH

Efficient Speaker Embedding Extraction Using a Twofold Sliding Window Algorithm for Speaker Diarization

Jeong-Hwan Choi, Ye-Rin Jeoung, Ilseok Kim et al.

2024 INTERSPEECH

Efficient SQA from Long Audio Contexts: A Policy-driven Approach

Alexander Johnson, Peter Plantinga, Pheobe Sun et al.

2024 INTERSPEECH

EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Low Resource and Multilingual Scenarios

Tejes Srivastava, Jiatong Shi, William Chen et al.

2024 INTERSPEECH

ElasticAST: An Audio Spectrogram Transformer for All Length and Resolutions

Jiu Feng, Mehmet Hamza Erol, Joon Son Chung et al.

2024 INTERSPEECH

Electroglottography for the assessment of dysphonia in Parkinson's disease and multiple system atrophy

Khalid Daoudi, Solange Milhé de Saint Victor, Alexandra Foubert-Samier et al.

2024 INTERSPEECH

Papers