Papers
Improving wav2vec2-based Spoken Language Identification by Learning Phonological Features
Mostafa Shahin, Zheng Nan, Vidhyasaharan Sethu et al.
Improving WaveRNN with Heuristic Dynamic Blending for Fast and High-Quality GPU Vocoding
Muyang Du, Chuan Liu, Jiaxing Qi et al.
Improving Zero-shot Cross-domain Slot Filling via Transformer-based Slot Semantics Fusion
Yuhang Li, Xiao Wei, Yuke Si et al.
Incorporating L2 Phonemes Using Articulatory Features for Robust Speech Recognition
Jisung Wang, Haram Lee, Myungwoo Oh
Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement through Knowledge Distillation
Rui-Chen Zheng, Yang Ai, Zhen-Hua Ling
Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff
Peter Polák, Brian Yan, Shinji Watanabe et al.
Influence of Personal Traits on Impressions of One's Own Voice
Hikaru Yanagida, Yusuke Ijima, Naohiro Tawara
Influence of Utterance and Speaker Characteristics on the Classification of Children with Cleft Lip and Palate
Ilja Baumann, Dominik Wagner, Franziska Braun et al.
Information Magnitude Based Dynamic Sub-sampling for Speech-to-text
Yuhao Zhang, Chenghao Gao, Kaiqi Kou et al.
Insights into end-to-end audio-to-score transcription with real recordings: A case study with saxophone works
Juan Carlos Martínez-Sevilla, María Alfaro-Contreras, Jose J. Valero-Mas et al.
Instance-based Temporal Normalization for Speaker Verification
Thanathai Lertpetchpun, Ekapol Chuangsuwanich
Integrated and Enhanced Pipeline System to Support Spoken Language Analytics for Screening Neurocognitive Disorders
Helen Meng, Brian Mak, Man-Wai Mak et al.
Integrating Emotion Recognition with Speech Recognition and Speaker Diarisation for Conversations
Wen Wu, Chao Zhang, Philip C. Woodland
Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding
Siddhant Arora, Hayato Futami, Yosuke Kashiwagi et al.
Integration of Frame- and Label-synchronous Beam Search for Streaming Encoder-decoder Speech Recognition
Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi et al.
Intelligible Lip-to-Speech Synthesis with Speech Units
Jeongsoo Choi, Minsu Kim, Yong Man Ro
Inter-connection: Effective Connection between Pre-trained Encoder and Decoder for Speech Translation
Yuta Nishikawa, Satoshi Nakamura
InterFormer: Interactive Local and Global Features Fusion for Automatic Speech Recognition
Zhi-Hao Lai, Tian-Hao Zhang, Qi Liu et al.
Interpretable Latent Space Using Space-Filling Curves for Phonetic Analysis in Voice Conversion
Mohammad Hassan Vali, Tom Bäckström
Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge
Wenhao Guan, Tao Li, Yishuang Li et al.
Intonation Control for Neural Text-to-Speech Synthesis with Polynomial Models of F0
Niamh Corkey, Johannah O'Mahony, Simon King
Intra-ensemble: A New Method for Combining Intermediate Outputs in Transformer-based Automatic Speech Recognition
Dohee Kim, Jieun Choi, Joon-Hyuk Chang
Introducing Self-Supervised Phonetic Information for Text-Independent Speaker Verification
Ziyang Zhang, Wu Guo, Bin Gu
〈'〉 in Tsimane': a Preliminary Investigation
William N. Havard, Yaya Sy, Camila Scaff et al.