Papers
The Third DIHARD Diarization Challenge
Neville Ryant, Prachi Singh, Venkat Krishnamohan et al.
The TNT Team System Descriptions of Cantonese and Mongolian for IARPA OpenASR20
Jing Zhao, Zhiqiang Lv, Ambyera Han et al.
The Zero Resource Speech Challenge 2021: Spoken Language Modelling
Ewan Dunbar, Mathieu Bernard, Nicolas Hamilakis et al.
Three-Class Overlapped Speech Detection Using a Convolutional Recurrent Neural Network
Jee-weon Jung, Hee-Soo Heo, Youngki Kwon et al.
Three-Module Modeling For End-to-End Spoken Language Understanding Using Pre-Trained DNN-HMM-Based Acoustic-Phonetic Model
Nick J.C. Wang, Lu Wang, Yandan Sun et al.
Tied & Reduced RNN-T Decoder
Rami Botros, Tara N. Sainath, Robert David et al.
Time Delay Estimation for Speaker Localization Using CNN-Based Parametrized GCC-PHAT Features
Daniele Salvati, Carlo Drioli, Gian Luca Foresti
Time-Frequency Representation Learning with Graph Convolutional Network for Dialogue-Level Speech Emotion Recognition
Jiaxing Liu, Yaodong Song, Longbiao Wang et al.
Time-to-Event Models for Analyzing Reaction Time Sequences
Louis ten Bosch, Lou Boves
Timing Generating Networks: Neural Network Based Precise Turn-Taking Timing Prediction in Multiparty Conversation
Shinya Fujie, Hayato Katayama, Jin Sakuma et al.
Token-Level Supervised Contrastive Learning for Punctuation Restoration
Qiushi Huang, Tom Ko, H. Lilian Tang et al.
Toward Genre Adapted Closed Captioning
François Buet, François Yvon
Towards an Accent-Robust Approach for ATC Communications Transcription
Nataly Jahchan, Florentin Barbier, Ariyanidevi Dharma Gita et al.
Towards Automatic Speech Recognition for People with Atypical Speech
Heidi Christensen
Towards Automatic Speech to Sign Language Generation
Parul Kapoor, Rudrabha Mukhopadhyay, Sindhu B. Hegde et al.
Towards Lifelong Learning of End-to-End ASR
Heng-Jui Chang, Hung-yi Lee, Lin-shan Lee
Towards Multi-Scale Style Control for Expressive Speech Synthesis
Xiang Li, Changhe Song, Jingbei Li et al.
Towards One Model to Rule All: Multilingual Strategy for Dialectal Code-Switching Arabic ASR
Shammur Absar Chowdhury, Amir Hussein, Ahmed Abdelali et al.
Towards Simultaneous Machine Interpretation
Alejandro Pérez-González-de-Martos, Javier Iranzo-Sánchez, Adrià Giménez Pastor et al.
Towards the Explainability of Multimodal Speech Emotion Recognition
Puneet Kumar, Vishesh Kaushik, Balasubramanian Raman
Towards the Prediction of the Vocal Tract Shape from the Sequence of Phonemes to be Articulated
Vinicius Ribeiro, Karyna Isaieva, Justine Leclere et al.
Toward Streaming ASR with Non-Autoregressive Insertion-Based Model
Yuya Fujita, Tianzi Wang, Shinji Watanabe et al.
Towards Unsupervised Phone and Word Segmentation Using Self-Supervised Vector-Quantized Neural Networks
Herman Kamper, Benjamin van Niekerk
Training Hybrid Models on Noisy Transliterated Transcripts for Code-Switched Speech Recognition
Matthew Wiesner, Mousmita Sarma, Ashish Arora et al.
Transcribing Paralinguistic Acoustic Cues to Target Language Text in Transformer-Based Speech-to-Text Translation
Hirotaka Tokuyama, Sakriani Sakti, Katsuhito Sudoh et al.