Papers
Normalization Driven Zero-Shot Multi-Speaker Speech Synthesis
Neeraj Kumar, Srishti Goel, Ankur Narang et al.
N-Singer: A Non-Autoregressive Korean Singing Voice Synthesis System for Pronunciation Enhancement
Gyeong-Hoon Lee, Tae-Woo Kim, Hanbin Bae et al.
NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling
Junhyeok Lee, Seungu Han
On-Device Streaming Transformer-Based End-to-End Speech Recognition
Yoo Rhee Oh, Kiyoung Park
One-Shot Voice Conversion with Speaker-Agnostic StarGAN
Sefik Emre Eskimez, Dimitrios Dimitriadis, Kenichi Kumatani et al.
One Size Does Not Fit All in Resource-Constrained ASR
Ethan Morris, Robbie Jimerson, Emily Prud’hommeaux
Online Blind Audio Source Separation Using Recursive Expectation-Maximization
Aviad Eisenberg, Boaz Schwartz, Sharon Gannot
Online Compressive Transformer for End-to-End Speech Recognition
Chi-Hang Leong, Yu-Han Huang, Jen-Tzung Chien
Online Speaker Diarization Equipped with Discriminative Modeling and Guided Inference
Xucheng Wan, Kai Liu, Huan Zhou
Online Streaming End-to-End Neural Diarization Handling Overlapping Speech and Flexible Numbers of Speakers
Yawen Xue, Shota Horiguchi, Yusuke Fujita et al.
On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer
Liang Lu, Zhong Meng, Naoyuki Kanda et al.
On Modeling Glottal Source Information for Phonation Assessment in Parkinson’s Disease
J.C. Vásquez-Correa, Julian Fritsch, J.R. Orozco-Arroyave et al.
On Sampling-Based Training Criteria for Neural Language Modeling
Yingbo Gao, David Thulke, Alexander Gerstenberger et al.
On the Design of Deep Priors for Unsupervised Audio Restoration
Vivek Sivaraman Narayanaswamy, Jayaraman J. Thiagarajan, Andreas Spanias
On the Feasibility of the Danish Model of Intonational Transcription: Phonetic Evidence from Jutlandic Danish
Anna Bothe Jespersen, Pavel Šturm, Míša Hejná
On-the-Fly Aligned Data Augmentation for Sequence-to-Sequence ASR
Tsz Kin Lam, Mayumi Ohta, Shigehiko Schamoni et al.
On the Learning Dynamics of Semi-Supervised Training for ASR
Electra Wallington, Benji Kershenbaum, Ondřej Klejch et al.
On the Limit of English Conversational Speech Recognition
Zoltán Tüske, George Saon, Brian Kingsbury
OpenASR20: An Open Challenge for Automatic Speech Recognition of Conversational Telephone Speech in Low-Resource Languages
Kay Peterson, Audrey Tong, Yan Yu
Optimally Encoding Inductive Biases into the Transformer Improves End-to-End Speech Translation
Piyush Vyas, Anastasia Kuznetsova, Donald S. Williamson
Optimising Hearing Aid Fittings for Speech in Noise with a Differentiable Hearing Loss Model
Zehai Tu, Ning Ma, Jon Barker
Optimizing an Automatic Creaky Voice Detection Method for Australian English Speaking Females
Hannah White, Joshua Penney, Andy Gibson et al.
Optimizing Latency for Online Video Captioning Using Audio-Visual Transformers
Chiori Hori, Takaaki Hori, Jonathan Le Roux
ORCA-SLANG: An Automatic Multi-Stage Semi-Supervised Deep Learning Framework for Large-Scale Killer Whale Call Type Identification
Christian Bergler, Manuel Schmitt, Andreas Maier et al.