Papers
8,761 papers found
Optimizing Large-Scale Context Retrieval for End-to-End ASR
Zhiqi Huang, Diamantino Caseiro, Kandarp Joshi et al.
Optimizing the role of human evaluation in LLM-based spoken document summarization systems
Margaret Kroll, Kelsey Kraus
Orthogonality and isotropy of speaker and phonetic information in self-supervised speech representations
Mukhtar Mohamed, Oli Danyi Liu, Hao Tang et al.
OR-TSE: An Overlap-Robust Speaker Encoder for Target Speech Extraction
Yiru Zhang, Linyu Yao, Qun Yang
Outlier Reduction with Gated Attention for Improved Post-training Quantization in Large Sequence-to-sequence Speech Foundation Models
Dominik Wagner, Ilja Baumann, Korbinian Riedhammer et al.
Out-of-distribution generalisation in spoken language understanding
Dejan Porjazovski, Anssi Moisio, Mikko Kurimo
Oversampling, Augmentation and Curriculum Learning for Speaking Assessment with Limited Training Data
Tin Mei Lun, Ekaterina Voskoboinik, Ragheb Al-Ghezi et al.
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
Yifan Peng, Jinchuan Tian, William Chen et al.
PAM: Prompting Audio-Language Models for Audio Quality Assessment
Soham Deshmukh, Dareen Alharthi, Benjamin Elizalde et al.
ParaCLAP – Towards a general language-audio model for computational paralinguistic tasks
Xin Jing, Andreas Triantafyllopoulos, Björn Schuller
Parameter-Efficient Adapter Based on Pre-trained Models for Speech Translation
Nan Chen, Yonghe Wang, Feilong Bao
Parameter-efficient Fine-tuning of Speaker-Aware Dynamic Prompts for Speaker Verification
Zhe Li, Man-wai Mak, Hung-yi Lee et al.
PARAN: Variational Autoencoder-based End-to-End Articulation-to-Speech System for Speech Intelligibility
Seyun Um, Doyeon Kim, Hong-Goo Kang
PARIS: Pseudo-AutoRegressIve Siamese Training for Online Speech Separation
Zexu Pan, Gordon Wichern, François G. Germain et al.
Participant-Pair-Wise Bottleneck Transformer for Engagement Estimation from Video Conversation
Keita Suzuki, Nobukatsu Hojo, Kazutoshi Shinoda et al.
Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition
Yicong Jiang, Tianzi Wang, Xurong Xie et al.
Perception of music and speech: Focus on rhythm processing
Barbara Tillmann
Perceptual Learning in Lexical Tone: Phonetic Similarity vs. Phonological Categories
Ariëlle Reitsema, Chenxin Li, Leanne van Lambalgen et al.
Performant ASR Models for Medical Entities in Accented Speech
Tejumade Afonja, Tobi Olatunji, Sewade Ogun et al.
Period Singer: Integrating Periodic and Aperiodic Variational Autoencoders for Natural-Sounding End-to-End Singing Voice Synthesis
Taewoo Kim, Choonsang Cho, Young Han Lee
PERSONA: an application for emotion recognition, gender recognition and age estimation
Devyani Koshal, Orchid Chetia Phukan, Sarthak Jain et al.
Personality-memory Gated Adaptation: An Efficient Speaker Adaptation for Personalized End-to-end Automatic Speech Recognition
Yue Gu, Zhihao Du, Shiliang Zhang et al.
Personalized Speech Enhancement Without a Separate Speaker Embedding Model
Tanel Pärnamaa, Ando Saabas
PFCA-Net: Pyramid Feature Fusion and Cross Content Attention Network for Automated Audio Captioning
Jianyuan Sun, Wenwu Wang, Mark D. Plumbley
Phoneme Discretized Saliency Maps for Explainable Detection of AI-Generated Voice
Shubham Gupta, Mirco Ravanelli, Pascal Germain et al.