Papers
SPGISpeech: 5,000 Hours of Transcribed Financial Audio for Fully Formatted End-to-End Speech Recognition
Patrick K. O’Neill, Vitaly Lavrukhin, Somshubra Majumdar et al.
Spine2Net: SpineNet with Res2Net and Time-Squeeze-and-Excitation Blocks for Speaker Recognition
Magdalena Rybicka, Jesús Villalba, Piotr Żelasko et al.
Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset
Ian Palmer, Andrew Rouditchenko, Andrei Barbu et al.
Spoken Term Detection and Relevance Score Estimation Using Dot-Product of Pronunciation Embeddings
Jan Švec, Luboš Šmídl, Josef V. Psutka et al.
SRI-B End-to-End System for Multilingual and Code-Switching ASR Challenges for Low Resource Indian Languages
Hardik Sailor, Kiran Praveen T, Vikas Agrawal et al.
SRIB-LEAP Submission to Far-Field Multi-Channel Speech Enhancement Challenge for Video Conferencing
R.G. Prithvi Raj, Rohit Kumar, M.K. Jayesh et al.
Stabilizing Label Assignment for Speech Separation by Self-Supervised Pre-Training
Sung-Feng Huang, Shun-Po Chuang, Da-Rong Liu et al.
StableEmit: Selection Probability Discount for Reducing Emission Latency of Streaming Monotonic Attention ASR
Hirofumi Inaguma, Tatsuya Kawahara
Stacked Recurrent Neural Networks for Speech-Based Inference of Attachment Condition in School Age Children
Huda Alsofyani, Alessandro Vinciarelli
StarGANv2-VC: A Diverse, Unsupervised, Non-Parallel Framework for Natural-Sounding Voice Conversion
Yinghao Aaron Li, Ali Zare, Nima Mesgarani
StarGAN-VC+ASR: StarGAN-Based Non-Parallel Voice Conversion Regularized by Automatic Speech Recognition
Shoki Sakamoto, Akira Taniguchi, Tadahiro Taniguchi et al.
Stochastic Attention Head Removal: A Simple and Effective Method for Improving Transformer Based ASR Models
Shucong Zhang, Erfan Loweimi, Peter Bell et al.
Stochastic Process Regression for Cross-Cultural Speech Emotion Recognition
Mani Kumar T, Enrique Sanchez, Georgios Tzimiropoulos et al.
Streaming End-to-End ASR Based on Blockwise Non-Autoregressive Models
Tianzi Wang, Yuya Fujita, Xuankai Chang et al.
Streaming End-to-End Speech Recognition for Hybrid RNN-T/Attention Architecture
Takafumi Moriya, Tomohiro Tanaka, Takanori Ashihara et al.
Streaming Multi-Talker Speech Recognition with Joint Speaker Identification
Liang Lu, Naoyuki Kanda, Jinyu Li et al.
Streaming Transformer for Hardware Efficient Voice Trigger Detection and False Trigger Mitigation
Vineet Garg, Wonil Chang, Siddharth Sigtia et al.
STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech
Keon Lee, Kyumin Park, Daeyoung Kim
Subjective Evaluation of Noise Suppression Algorithms in Crowdsourcing
Babak Naderi, Ross Cutler
Subtitle Translation as Markup Translation
Colin Cherry, Naveen Arivazhagan, Dirk Padfield et al.
SUPERB: Speech Processing Universal PERformance Benchmark
Shu-wen Yang, Po-Han Chi, Yung-Sung Chuang et al.
Super-Human Performance in Online Low-Latency Recognition of Conversational Speech
Thai-Son Nguyen, Sebastian Stüker, Alex Waibel
Synchronic Fortition in Five Romance Languages? A Large Corpus-Based Study of Word-Initial Devoicing
Mathilde Hutin, Yaru Wu, Adèle Jatteau et al.
Synchronising Speech Segments with Musical Beats in Mandarin and English Singing
Cong Zhang, Jian Zhu
SynthASR: Unlocking Synthetic Data for Speech Recognition
Amin Fazel, Wei Yang, Yulan Liu et al.