Papers
Streaming Align-Refine for Non-autoregressive Deliberation
Wang Weiran, Ke Hu, Tara Sainath
Streaming Automatic Speech Recognition with Re-blocking Processing Based on Integrated Voice Activity Detection
Yui Sudo, Shakeel Muhammad, Kazuhiro Nakadai et al.
Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification
Chao Zhang, Bo Li, Tara Sainath et al.
Streaming Intended Query Detection using E2E Modeling for Continued Conversation
Shuo-Yiin Chang, Guru Prakash, Zelin Wu et al.
Streaming model for Acoustic to Articulatory Inversion with transformer networks
Sathvik Udupa, Aravind Illa, Prasanta Ghosh
Streaming Multi-Talker ASR with Token-Level Serialized Output Training
Naoyuki Kanda, Jian Wu, Yu Wu et al.
Streaming parallel transducer beam search with fast slow cascaded encoders
Jay Mahadeokar, Yangyang Shi, Ke Li et al.
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings
Naoyuki Kanda, Jian Wu, Yu Wu et al.
Streaming Target-Speaker ASR with Neural Transducer
Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai et al.
STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent
Yuki Saito, Yuto Nishimura, Shinnosuke Takamichi et al.
Sub-8-Bit Quantization Aware Training for 8-Bit Neural Network Accelerator with On-Device Speech Recognition
Kai Zhen, Hieu Duy Nguyen, Raviteja Chinta et al.
Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training
Chengyi Wang, Yiming Wang, Yu Wu et al.
SVTS: Scalable Video-to-Speech Synthesis
Rodrigo Schoburg Carrillo de Mira, Alexandros Haliassos, Stavros Petridis et al.
Syllable sequence of /a/+/ta/ can be heard as /atta/ in Japanese with visual or tactile cues
Takayuki Arai, Miho Yamada, Megumi Okusawa
TALCS: An open-source Mandarin-English code-switching corpus and a speech recognition baseline
Chengfei Li, Shuhao Deng, Yaoping Wang et al.
Tandem Multitask Training of Speaker Diarisation and Speech Recognition for Meeting Transcription
Xianrui Zheng, Chao Zhang, Phil Woodland
Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches
Zifeng Zhao, Dongchao Yang, Rongzhi Gu et al.
TaylorBeamformer: Learning All-Neural Beamformer for Multi-Channel Speech Enhancement from Taylor’s Approximation Theory
Andong Li, Guochen Yu, Chengshi Zheng et al.
TB or not TB? Acoustic cough analysis for tuberculosis classification
Geoffrey T. Frost, Grant Theron, Thomas Niesler
Telling self-defining memories: An acoustic study of natural emotional speech productions
Veronique Delvaux, Audrey Lavallée, Fanny Degouis et al.
Temporal coding with magnitude-phase regularization for sound event detection
Sangwook Park, Sandeep Reddy Kothinti, Mounya Elhilali
Temporal Self Attention-Based Residual Network for Environmental Sound Classification
Achyut Tripathi, Konark Paul
Text aware Emotional Text-to-speech with BERT
Arijit Mukherjee, Shubham Bansal, Sandeepkumar Satpal et al.
Text-driven Emotional Style Control and Cross-speaker Style Transfer in Neural TTS
Yookyung Shin, Younggun Lee, Suhee Jo et al.