Research Explorer

Whisper-PMFA: Partial Multi-Scale Feature Aggregation for Speaker Verification using Whisper Models

Yiyang Zhao, Shuai Wang, Guangzhi Sun et al.

2024 INTERSPEECH

Whister: Using Whisper’s representations for Stuttering detection

Vrushank Changawala, Frank Rudzicz

2024 INTERSPEECH

Who Finds This Voice Attractive? A Large-Scale Experiment Using In-the-Wild Data

Hitoshi Suda, Aya Watanabe, Shinnosuke Takamichi

2024 INTERSPEECH

Word-level Text Markup for Prosody Control in Speech Synthesis

Yuliya Korotkova, Ilya Kalinovskiy, Tatiana Vakhrusheva

2024 INTERSPEECH

wTIMIT2mix: A Cocktail Party Mixtures Database to Study Target Speaker Extraction for Normal and Whispered Speech

Marvin Borsdorf, Zexu Pan, Haizhou Li et al.

2024 INTERSPEECH

XANE: eXplainable Acoustic Neural Embeddings

Sri Harsha Dumpala, Dushyant Sharma, Chandramouli Shama Sastry et al.

2024 INTERSPEECH

X-E-Speech: Joint Training Framework of Non-Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion

Houjian Guo, Chaoran Liu, Carlos Toshinori Ishi et al.

2024 INTERSPEECH

X-Singer: Code-Mixed Singing Voice Synthesis via Cross-Lingual Learning

Ji-Sang Hwang, Hyeongrae Noh, Yoonseok Hong et al.

2024 INTERSPEECH

XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model

Edresson Casanova, Kelly Davis, Eren Gölge et al.

2024 INTERSPEECH

YOLOPitch: A Time-Frequency Dual-Branch YOLO Model for Pitch Estimation

Xuefei Li, Hao Huang, Ying Hu et al.

2024 INTERSPEECH

YOLO-Stutter: End-to-end Region-Wise Speech Dysfluency Detection

Xuanru Zhou, Anshul Kashyap, Steve Li et al.

2024 INTERSPEECH

Zero-Shot End-To-End Spoken Question Answering In Medical Domain

Yanis Labrak, Adel Moumen, Richard Dufour et al.

2024 INTERSPEECH

Zero-Shot Fake Video Detection by Audio-Visual Consistency

Xiaolou Li, Zehua Liu, Chen Chen et al.

2024 INTERSPEECH

Zero-shot Out-of-domain is No Joke: Lessons Learned in the VoiceMOS 2023 MOS Prediction Challenge

Marie Kunešová, Jan Lehečka, Josef Michálek et al.

2024 INTERSPEECH

ZeroST: Zero-Shot Speech Translation

Sameer Khurana, Chiori Hori, Antoine Laurent et al.

2024 INTERSPEECH

2-bit Conformer quantization for automatic speech recognition

Oleg Rybakov, Phoenix Meadowlark, Shaojin Ding et al.

2023 INTERSPEECH

4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders

Yui Sudo, Shakeel Muhammad, Brian Yan et al.

2023 INTERSPEECH

5G-IoT Cloud based Demonstration of Real-Time Audio-Visual Speech Enhancement for Multimodal Hearing-aids

Ankit Gupta, Abhijeet Bishnu, Mandar Gogate et al.

2023 INTERSPEECH

5IDER: Unified Query Rewriting for Steering, Intent Carryover, Disfluencies, Entity Carryover and Repair

Jiarui Lu, Bo-Hsiang Tseng, Joel Ruben Antony Moniz et al.

2023 INTERSPEECH

ABC-KD: Attention-Based-Compression Knowledge Distillation for Deep Learning-Based Noise Suppression

Yixin Wan, Yuan Zhou, Xiulian Peng et al.

2023 INTERSPEECH

Aberystwyth English Pre-aspiration in Apparent Time

Míša Michaela Hejná, Adèle Jatteau

2023 INTERSPEECH

A Binary Keyword Spotting System with Error-Diffusion Based Feature Binarization

Dingyi Wang, Mengjie Luo, Lin Li et al.

2023 INTERSPEECH

Abusive Speech Detection in Indic Languages Using Acoustic Features

Anika A. Spiesberger, Andreas Triantafyllopoulos, Iosif Tsangko et al.

2023 INTERSPEECH

ACA-Net: Towards Lightweight Speaker Verification using Asymmetric Cross Attention

Jia Qi Yip, Duc-Tuan Truong, Dianwen Ng et al.

2023 INTERSPEECH

Accelerating Transducers through Adjacent Token Merging

Yuang Li, Yu Wu, Jinyu Li et al.

2023 INTERSPEECH

Papers