Papers
Out of a Hundred Trials, How Many Errors Does Your Speaker Verifier Make?
Niko Brümmer, Luciana Ferrer, Albert Swart
Out-of-Vocabulary Words Detection with Attention and CTC Alignments in an End-to-End ASR System
Ekaterina Egorova, Hari Krishna Vydana, Lukáš Burget et al.
Overlapped Speech Detection Based on Spectral and Spatial Feature Fusion
Weiguang Chen, Van Tung Pham, Eng Siong Chng et al.
Pairing Weak with Strong: Twin Models for Defending Against Adversarial Attack on Speaker Verification
Zhiyuan Peng, Xu Li, Tan Lee
PANACEA Cough Sound-Based Diagnosis of COVID-19 for the DiCOVA 2021 Challenge
Madhu R. Kamble, Jose A. Gonzalez-Lopez, Teresa Grau et al.
Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling
Isaac Elias, Heiga Zen, Jonathan Shen et al.
Parametric Distributions to Model Numerical Emotion Labels
Deboshree Bose, Vidhyasaharan Sethu, Eliathamby Ambikairajah
Paraphrase Label Alignment for Voice Application Retrieval in Spoken Language Understanding
Zheng Gao, Radhika Arava, Qian Hu et al.
Parental Spoken Scaffolding and Narrative Skills in Crowd-Sourced Storytelling Samples of Young Children
Zhengjun Yue, Jon Barker, Heidi Christensen et al.
Parsing Speech for Grouping and Prominence, and the Typology of Rhythm
Michael Wagner, Alvaro Iturralde Zurita, Sijia Zhang
Partially-Connected Differentiable Architecture Search for Deepfake and Spoofing Detection
Wanying Ge, Michele Panariello, Jose Patino et al.
PATE-AAE: Incorporating Adversarial Autoencoder into Private Aggregation of Teacher Ensembles for Spoken Command Classification
Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee
PDF: Polyphone Disambiguation in Chinese by Using FLAT
Haiteng Zhang
Perception of Social Speaker Characteristics in Synthetic Speech
Sai Sirisha Rallabandi, Abhinav Bharadwaj, Babak Naderi et al.
Perception of Standard Arabic Synthetic Speech Rate
Yahya Aldholmi, Rawan Aldhafyan, Asma Alqahtani
Perceptual Contributions of Vowels and Consonant-Vowel Transitions in Understanding Time-Compressed Mandarin Sentences
Changjie Pan, Feng Yang, Fei Chen
Personalized Keyphrase Detection Using Speaker and Environment Information
Rajeev Rikhye, Quan Wang, Qiao Liang et al.
Personalized PercepNet: Real-Time, Low-Complexity Target Voice Separation and Enhancement
Ritwik Giri, Shrikant Venkataramani, Jean-Marc Valin et al.
Personalized Speech Enhancement Through Self-Supervised Data Augmentation and Purification
Aswin Sivaraman, Sunwoo Kim, Minje Kim
Phone-Level Pronunciation Scoring for Spanish Speakers Learning English Using a GOP-DNN System
Jazmín Vidal, Cyntia Bonomi, Marcelo Sancinetti et al.
Phoneme-Aware and Channel-Wise Attentive Learning for Text Dependent Speaker Verification
Yan Liu, Zheng Li, Lin Li et al.
PhonemeBERT: Joint Language Modelling of Phoneme Sequence and ASR Transcript
Mukuntha Narayanan Sundararaman, Ayush Kumar, Jithendra Vepa
Phoneme Duration Modeling Using Speech Rhythm-Based Speaker Embeddings for Multi-Speaker Speech Synthesis
Kenichi Fujita, Atsushi Ando, Yusuke Ijima
Phoneme Recognition Through Fine Tuning of Phonetic Representations: A Case Study on Luhya Language Varieties
Kathleen Siminyu, Xinjian Li, Antonios Anastasopoulos et al.
Phoneme-to-Audio Alignment with Recurrent Neural Networks for Speaking and Singing Voice
Yann Teytaut, Axel Roebel