Research Explorer

Optimizing Large-Scale Context Retrieval for End-to-End ASR

Zhiqi Huang, Diamantino Caseiro, Kandarp Joshi et al.

2024 INTERSPEECH

Optimizing the role of human evaluation in LLM-based spoken document summarization systems

Margaret Kroll, Kelsey Kraus

2024 INTERSPEECH

Orthogonality and isotropy of speaker and phonetic information in self-supervised speech representations

Mukhtar Mohamed, Oli Danyi Liu, Hao Tang et al.

2024 INTERSPEECH

OR-TSE: An Overlap-Robust Speaker Encoder for Target Speech Extraction

Yiru Zhang, Linyu Yao, Qun Yang

2024 INTERSPEECH

Outlier Reduction with Gated Attention for Improved Post-training Quantization in Large Sequence-to-sequence Speech Foundation Models

Dominik Wagner, Ilja Baumann, Korbinian Riedhammer et al.

2024 INTERSPEECH

Out-of-distribution generalisation in spoken language understanding

Dejan Porjazovski, Anssi Moisio, Mikko Kurimo

2024 INTERSPEECH

Oversampling, Augmentation and Curriculum Learning for Speaking Assessment with Limited Training Data

Tin Mei Lun, Ekaterina Voskoboinik, Ragheb Al-Ghezi et al.

2024 INTERSPEECH

OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer

Yifan Peng, Jinchuan Tian, William Chen et al.

2024 INTERSPEECH

PAM: Prompting Audio-Language Models for Audio Quality Assessment

Soham Deshmukh, Dareen Alharthi, Benjamin Elizalde et al.

2024 INTERSPEECH

ParaCLAP – Towards a general language-audio model for computational paralinguistic tasks

Xin Jing, Andreas Triantafyllopoulos, Björn Schuller

2024 INTERSPEECH

Parameter-Efficient Adapter Based on Pre-trained Models for Speech Translation

Nan Chen, Yonghe Wang, Feilong Bao

2024 INTERSPEECH

Parameter-efficient Fine-tuning of Speaker-Aware Dynamic Prompts for Speaker Verification

Zhe Li, Man-wai Mak, Hung-yi Lee et al.

2024 INTERSPEECH

PARAN: Variational Autoencoder-based End-to-End Articulation-to-Speech System for Speech Intelligibility

Seyun Um, Doyeon Kim, Hong-Goo Kang

2024 INTERSPEECH

PARIS: Pseudo-AutoRegressIve Siamese Training for Online Speech Separation

Zexu Pan, Gordon Wichern, François G. Germain et al.

2024 INTERSPEECH

Participant-Pair-Wise Bottleneck Transformer for Engagement Estimation from Video Conversation

Keita Suzuki, Nobukatsu Hojo, Kazutoshi Shinoda et al.

2024 INTERSPEECH

Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition

Yicong Jiang, Tianzi Wang, Xurong Xie et al.

2024 INTERSPEECH

Perception of music and speech: Focus on rhythm processing

Barbara Tillmann

2024 INTERSPEECH

Perceptual Learning in Lexical Tone: Phonetic Similarity vs. Phonological Categories

Ariëlle Reitsema, Chenxin Li, Leanne van Lambalgen et al.

2024 INTERSPEECH

Performant ASR Models for Medical Entities in Accented Speech

Tejumade Afonja, Tobi Olatunji, Sewade Ogun et al.

2024 INTERSPEECH

Period Singer: Integrating Periodic and Aperiodic Variational Autoencoders for Natural-Sounding End-to-End Singing Voice Synthesis

Taewoo Kim, Choonsang Cho, Young Han Lee

2024 INTERSPEECH

PERSONA: an application for emotion recognition, gender recognition and age estimation

Devyani Koshal, Orchid Chetia Phukan, Sarthak Jain et al.

2024 INTERSPEECH

Personality-memory Gated Adaptation: An Efficient Speaker Adaptation for Personalized End-to-end Automatic Speech Recognition

Yue Gu, Zhihao Du, Shiliang Zhang et al.

2024 INTERSPEECH

Personalized Speech Enhancement Without a Separate Speaker Embedding Model

Tanel Pärnamaa, Ando Saabas

2024 INTERSPEECH

PFCA-Net: Pyramid Feature Fusion and Cross Content Attention Network for Automated Audio Captioning

Jianyuan Sun, Wenwu Wang, Mark D. Plumbley

2024 INTERSPEECH

Phoneme Discretized Saliency Maps for Explainable Detection of AI-Generated Voice

Shubham Gupta, Mirco Ravanelli, Pascal Germain et al.

2024 INTERSPEECH

Papers