Papers
Phonetic-assisted Multi-Target Units Modeling for Improving Conformer-Transducer ASR system
Li Li, Dongxing Xu, Haoran Wei et al.
PhonMatchNet: Phoneme-Guided Zero-Shot Keyword Spotting for User-Defined Keywords
Yong-Hyeok Lee, Namhyun Cho
PIAVE: A Pose-Invariant Audio-Visual Speaker Extraction Network
Qinghua Liu, Meng Ge, Zhizheng Wu et al.
Pitch Accent Variation and the Interpretation of Rising and Falling Intonation in American English
Thomas Sostarics, Jennifer Cole
Pitch distributions in a very large corpus of spontaneous Finnish speech
Mietta Lennes, Minnaleena Toivola
PLCMOS – A Data-driven Non-intrusive Metric for The Evaluation of Packet Loss Concealment Algorithms
Lorenz Diener, Marju Purin, Sten Sootla et al.
PoCaPNet: A Novel Approach for Surgical Phase Recognition Using Speech and X-Ray Images
Kubilay Can Demir, Tobias Weise, Matthias May et al.
Point to the Hidden: Exposing Speech Audio Splicing via Signal Pointer Nets
Denise Moussa, Germans Hirsch, Sebastian Wankerl et al.
Powerset multi-class cross entropy loss for neural speaker diarization
Alexis Plaquet, Hervé Bredin
Predicting Perceptual Centers Located at Vowel Onset in German Speech Using Long Short-Term Memory Networks
Felicia Schulz, Mirella De Sisto, M. Paula Roncaglia-Denissen et al.
Prediction of the Gender-based Violence Victim Condition using Speech: What do Machine Learning Models rely on?
Emma Reyner-Fuentes, Esther Rituerto-González, Isabel Trancoso et al.
Preference-based training framework for automatic speech quality assessment using deep neural network
Cheng-Hung Hu, Yusuke Yasuda, Tomoki Toda
Preference Learning Labels by Anchoring on Consecutive Annotations
Abinay Reddy Naini, Ali N. Salman, Carlos Busso
Pre-Finetuning for Few-Shot Emotional Speech Recognition
Maximillian Chen, Zhou Yu
Prefix Search Decoding for RNN Transducers
Kiran Praveen, Advait Vinay Dhopeshwarkar, Abhishek Pandey et al.
Prior-free Guided TTS: An Improved and Efficient Diffusion-based Text-Guided Speech Synthesis
Won-Gook Choi, So-Jeong Kim, TaeHo Kim et al.
Privacy-preserving Representation Learning for Speech Understanding
Minh Tran, Mohammad Soleymani
Privacy Risks in Speech Emotion Recognition: A Systematic Study on Gender Inference Attack
Basmah Alsenani, Tanaya Guha, Alessandro Vinciarelli
Probing Self-supervised Speech Models for Phonetic and Phonemic Information: A Case Study in Aspiration
Kinan Martin, Jon Gauthier, Canaan Breiss et al.
Probing Speech Quality Information in ASR Systems
Bao Thang Ta, Minh Tu Le, Nhat Minh Le et al.
Progress and Prospects for Spoken Language Technology: Results from Five Sexennial Surveys
Roger K. Moore, Ricard Marxer
Promoting Mental Self-Disclosure in a Spoken Dialogue System
Mahdin Rohmatillah, Bobbi Aditya, Li-Jen Yang et al.
Prompt Guided Copy Mechanism for Conversational Question Answering
Yong Zhang, Zhitao Li, Jianzong Wang et al.
Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization
Puyuan Peng, Brian Yan, Shinji Watanabe et al.