Papers
PromptStyle: Controllable Style Transfer for Text-to-Speech with Natural Language Descriptions
Guanghou Liu, Yongmao Zhang, Yi Lei et al.
PronScribe: Highly Accurate Multimodal Phonemic Transcription From Speech and Text
Yang Yu, Matthew Perez, Ankur Bapna et al.
ProsAudit, a prosodic benchmark for self-supervised speech models
Maureen de Seyssel, Marvin Lavechin, Hadrien Titeux et al.
Prosody-controllable Gender-ambiguous Speech Synthesis: A Tool for Investigating Implicit Bias in Speech Perception
Éva Székely, Joakim Gustafson, Ilaria Torre
Prosody Modeling with 3D Visual Information for Expressive Video Dubbing
Zhihan Yang, Shansong Liu, Xu Li et al.
Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech
Hyungchan Yoon, Changhwan Kim, Eunwoo Song et al.
Pseudo-Siamese Network based Timbre-reserved Black-box Adversarial Attack in Speaker Identification
Qing Wang, Jixun Yao, Ziqian Wang et al.
PunCantonese: A Benchmark Corpus for Low-Resource Cantonese Punctuation Restoration from Speech Transcripts
Yunxiang Li, Pengfei Liu, Xixin Wu et al.
Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation
Ziyang Ma, Zhisheng Zheng, Guanrou Yang et al.
P-vectors: A Parallel-coupled TDNN/Transformer Network for Speaker Verification
Xiyuan Wang, Fangyuan Wang, Bo Xu et al.
Quantifying Informational Masking due to Masker Intelligibility in Same-talker Speech-in-speech Perception
Mingyue Huo, Yinglun Sun, Dan Fogerty et al.
Quantifying the perceptual value of lexical and non-lexical channels in speech
Sarenne Wallbridge, Peter Bell, Catherine Lai
Quantization-aware and Tensor-compressed Training of Transformers for Natural Language Understanding
Zi Yang, Samridhi Choudhary, Siegfried Kunzmann et al.
Queer Events, Relationships, and Sports: Does Topic Influence Speakers’ Acoustic Expression of Sexual Orientation?
Sven Kachel, Manuel Pöhlmann, Christine Nussbaum
Query Based Acoustic Summarization for Podcasts
Samantha Kotey, Rozenn Dahyot, Naomi Harte
Question-Context Alignment and Answer-Context Dependencies for Effective Answer Sentence Selection
Minh Van Nguyen, Kishan KC, Toan Nguyen et al.
QVoice: Arabic Speech Pronunciation Learning Application
Yassine El Kheir, Fouad Khnaisser, Shammur Absar Chowdhury et al.
RAD-MMM: Multilingual Multiaccented Multispeaker Text To Speech
Rohan Badlani, Rafael Valle, Kevin J. Shih et al.
RAMP: Retrieval-Augmented MOS Prediction via Confidence-based Dynamic Weighting
Hui Wang, Shiwan Zhao, Xiguang Zheng et al.
Random Forest Classification of Breathing Phases from Audio Signals Recorded using Mobile Devices
Vitória S. Fahed, Emer P Doheny, Madeleine M Lowery
Random Utterance Concatenation Based Data Augmentation for Improving Short-video Speech Recognition
Yist Y. Lin, Tao Han, Haihua Xu et al.
Range-Based Equal Error Rate for Spoof Localization
Lin Zhang, Xin Wang, Erica Cooper et al.