Papers
PodcastMix: A dataset for separating music and speech in podcasts
Nicolás Schmidt, Jordi Pons, Marius Miron
PoeticTTS - Controllable Poetry Reading for Literary Studies
Julia Koch, Florian Lux, Nadja Schauffler et al.
Positional Encoding for Capturing Modality Specific Cadence for Emotion Detection
Hira Dhamyal, Bhiksha Raj, Rita Singh
Practical Over-the-air Perceptual AcousticWatermarking
Ameya Agaskar
Predicting Emotional Intensity in Political Debates via Non-verbal Signals
Jeewoo Yoon, Jinyoung Han, Erik Bucy et al.
Predicting label distribution improves non-intrusive speech quality estimation
Abu Zaher Md Faridee, Hannes Gamper
Predicting pairwise preferences between TTS audio stimuli using parallel ratings data and anti-symmetric twin neural networks
Cassia Valentini-Botinhao, Manuel Sam Ribeiro, Oliver Watts et al.
Predicting Speech Intelligibility using the Spike Acativity Mutual Information Index
Franklin Alvarez Cardinale, Waldo Nogueira
Predicting VQVAE-based Character Acting Style from Quotation-Annotated Text for Audiobook Speech Synthesis
Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi et al.
Prediction of L2 speech proficiency based on multi-level linguistic features
Verdiana De Fino, Lionel Fontan, Julien Pinquier et al.
Pre-trained Speech Representations as Feature Extractors for Speech Quality Assessment in Online Conferencing Applications
Bastiaan Tamm, Helena Balabin, Rik Vandenberghe et al.
Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data
Junyi Ao, Ziqiang Zhang, Long Zhou et al.
Preventing sensitive-word recognition using self-supervised learning to preserve user-privacy for automatic speech recognition
Yuchen Liu, Apu Kapadia, Donald Williamson
PRISM: Pre-trained Indeterminate Speaker Representation Model for Speaker Diarization and Speaker Verification
Siqi Zheng, Hongbin Suo, Qian Chen
Probabilistic Spherical Discriminant Analysis: An Alternative to PLDA for length-normalized embeddings
Niko Brummer, Albert Swart, Ladislav Mosner et al.
Probing phoneme, language and speaker information in unsupervised speech representations
Maureen de Seyssel, Marvin Lavechin, Yossi Adi et al.
Probing speech emotion recognition transformers for linguistic knowledge
Andreas Triantafyllopoulos, Johannes Wagner, Hagen Wierstorf et al.
Production characteristics of obstruents in WaveNet and older TTS systems
Ayushi Pandey, Sébastien Le Maguer, Julie Carson-Berndsen et al.
Production federated keyword spotting via distillation, filtering, and joint federated-centralized training
Andrew Hard, Kurt Partridge, Neng Chen et al.
Production Strategies of Vocal Attitudes
Léane Salais, Pablo Arias, Clément Le Moine et al.
Prompt-based Re-ranking Language Model for ASR
Mengxi Nie, Ming Yan, Caixia Gong
Pronunciation Dictionary-Free Multilingual Speech Synthesis by Combining Unsupervised and Supervised Phonetic Representations
Chang Liu, Zhen-Hua Ling, Ling-Hui Chen
Prosodic alignment for off-screen automatic dubbing
Yogesh Virkar, Marcello Federico, Robert Enyedi et al.
Prosodic Information in Dialect Identification of a Tonal Language: The case of Ao
Moakala Tzudir, Priyankoo Sarmah, S R Mahadeva Prasanna
Prototypical speaker-interference loss for target voice separation using non-parallel audio samples
Seongkyu Mun, Dhananjaya Gowda, Jihwan Lee et al.