Papers
Karaoker: Alignment-free singing voice synthesis with speech training data
Panagiotis Kakoulidis, Nikolaos Ellinas, Georgios Vamvoukakis et al.
KaraTuner: Towards End-to-End Natural Pitch Correction for Singing Voice in Karaoke
Xiaobin Zhuang, Huiran Yu, Weifeng Zhao et al.
Keyword Spotting with Synthetic Data using Heterogeneous Knowledge Distillation
Yuna Lee, Seung Jun Baek
kidsTALC: A Corpus of 3- to 11-year-old German Children’s Connected Natural Speech
Lars Rumberg, Christopher Gebauer, Hanna Ehlert et al.
Knowledge Distillation For CTC-based Speech Recognition Via Consistent Acoustic Representation Learning
Sanli Tian, Keqi Deng, Zehan Li et al.
Knowledge distillation for In-memory keyword spotting model
Zeyang Song, Qi Liu, Qu Yang et al.
Knowledge Distillation via Module Replacing for Automatic Speech Recognition with Recurrent Neural Network Transducer
Kaiqi Zhao, Hieu Nguyen, Animesh Jain et al.
Knowledge of accent differences can be used to predict speech recognition
Tuende Szalay, Mostafa Shahin, Beena Ahmed et al.
Knowledge Transfer and Distillation from Autoregressive to Non-Autoregessive Speech Recognition
Xun Gong, Zhikai Zhou, Yanmin Qian
KSC2: An Industrial-Scale Open-Source Kazakh Speech Corpus
Saida Mussakhojayeva, Yerbolat Khassanov, Huseyin Atakan Varol
K-Wav2vec 2.0: Automatic Speech Recognition based on Joint Decoding of Graphemes and Syllables
Jounghee Kim, Pilsung Kang
L2-GEN: A Neural Phoneme Paraphrasing Approach to L2 Speech Synthesis for Mispronunciation Diagnosis
Daniel Zhang, Ashwinkumar Ganesan, Sarah Campbell et al.
LAE: Language-Aware Encoder for Monolingual and Multilingual ASR
Jinchuan Tian, Jianwei Yu, Chunlei Zhang et al.
Language Model-Based Emotion Prediction Methods for Emotional Speech Synthesis Systems
Hyun-Wook Yoon, Ohsung Kwon, Hoyeon Lee et al.
Language-specific Characteristic Assistance for Code-switching Speech Recognition
Tongtong Song, Qiang Xu, Meng Ge et al.
Language-specific interactions of vowel discrimination in noise
Mark Gibson, Marcel Schlechtweg, Beatriz Blecua Falgueras et al.
Large-Scale Streaming End-to-End Speech Translation with Neural Transducers
Jian Xue, Peidong Wang, Jinyu Li et al.
Latency Control for Keyword Spotting
Christin Jose, Joe Wang, Grant Strimel et al.
LCSM: A Lightweight Complex Spectral Mapping Framework for Stereophonic Acoustic Echo Cancellation
Chenggang Zhang, JinJiang Liu, Xueliang Zhang
Learn2Sing 2.0: Diffusion and Mutual Information-Based Target Speaker SVS by Learning from Singing Teacher
Heyang Xue, Xinsheng Wang, Yongmao Zhang et al.
Learnable Sparse Filterbank for Speaker Verification
Junyi Peng, Rongzhi Gu, Ladislav Mošner et al.
Learning Audio-Text Agreement for Open-vocabulary Keyword Spotting
Hyeon-Kyeong Shin, Hyewon Han, Doyeon Kim et al.
Learning from human perception to improve automatic speaker verification in style-mismatched conditions
Amber Afshan, Abeer Alwan
Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT
Bowen Shi, Abdelrahman Mohamed, Wei-Ning Hsu