Papers
Biometric Russian Audio-Visual Extended MASKS (BRAVE-MASKS) Corpus: Multimodal Mask Type Recognition Task
Maxim Markitantov, Elena Ryumina, Dmitry Ryumin et al.
BIT-MI Deep Learning-based Model to Non-intrusive Speech Quality Assessment Challenge in Online Conferencing Applications
Miao Liu, Jing Wang, Liang Xu et al.
Blind Language Separation: Disentangling Multilingual Cocktail Party Voices by Language
Marvin Borsdorf, Kevin Scheck, Haizhou Li et al.
Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation
Keqi Deng, Shinji Watanabe, Jiatong Shi et al.
Boosting Self-Supervised Embeddings for Speech Enhancement
Kuo-Hsuan Hung, Szu-wei Fu, Huan-Hsin Tseng et al.
Bottleneck Low-rank Transformers for Low-resource Spoken Language Understanding
Pu Wang, Hugo Van hamme
Bottom-up discovery of structure and variation in response tokens (‘backchannels’) across diverse languages
Andreas Liesenfeld, Mark Dingemanse
Bring dialogue-context into RNN-T for streaming ASR
junfeng Hou, Jinkun Chen, Wanyu Li et al.
Building African Voices
Perez Ogayo, Graham Neubig, Alan W Black
Building Vietnamese Conversational Smart Home Dataset and Natural Language Understanding Model
Thi Thu Trang NGUYEN, Trung Duc Anh Dang, Quoc Viet Vu et al.
Bunched LPCNet2: Efficient Neural Vocoders Covering Devices from Cloud to Edge
Sangjun Park, Kihyun Choo, Joohyung Lee et al.
ByT5 model for massively multilingual grapheme-to-phoneme conversion
Jian Zhu, Cong Zhang, David Jurgens
Calibrate and Refine! A Novel and Agile Framework for ASR Error Robust Intent Detection
Peilin Zhou, Dading Chong, Helin Wang et al.
CALM: Constrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis
Yi Meng, Xiang Li, Zhiyong Wu et al.
Can Humans Correct Errors From System? Investigating Error Tendencies in Speaker Identification Using Crowdsourcing
Yuta Ide, Susumu Saito, Teppei Nakano et al.
CaTT-KWS: A Multi-stage Customized Keyword Spotting Framework based on Cascaded Transducer-Transformer
Zhanheng Yang, Sining Sun, Jin Li et al.
CAUSE: Crossmodal Action Unit Sequence Estimation from Speech
Hirokazu Kameoka, Takuhiro Kaneko, Shogo Seki et al.
CCATMos: Convolutional Context-aware Transformer Network for Non-intrusive Speech Quality Assessment
Yuchen Liu, Li-Chia Yang, Alexander Pawlicki et al.
Censer: Curriculum Semi-supervised Learning for Speech Recognition Based on Self-supervised Pre-training
Bowen Zhang, Songjun Cao, Xiaoming Xhang et al.
Chain-based Discriminative Autoencoders for Speech Recognition
Hung-Shin Lee, Pin-Tuan Huang, Yao-Fei Cheng et al.
Challenges and Opportunities in Multi-device Speech Processing
Gregory Ciccarelli, Jarred Barber, Arun Nair et al.
Challenges in Metadata Creation for Massive Naturalistic Team-Based Audio Data
Chelzy Belitz, John H.L. Hansen
Challenges of using longitudinal and cross-domain corpora on studies of pathological speech
Catarina Botelho, Tanja Schultz, Alberto Abad et al.
Challenges remain in Building ASR for Spontaneous Preschool Children Speech in Naturalistic Educational Environments
Satwik Dutta, Sarah Anne Tao, Jacob C. Reyna et al.
Characterizing Therapist's Speaking Style in Relation to Empathy in Psychotherapy
Dehua Tao, Tan Lee, Harold Chui et al.