Papers
SCP-GAN: Self-Correcting Discriminator Optimization for Training Consistency Preserving Metric GAN on Speech Enhancement Tasks
Vasily Zadorozhnyy, Qiang Ye, Kazuhito Koishida
SDNet: Stream-attention and Dual-feature Learning Network for Ad-hoc Array Speech Separation
Honglong Wang, Chengyun Deng, Yanjie Fu et al.
Second language identification of Vietnamese tones by native Mandarin learners
Juqiang Chen, Ailing Qin, Hui Chang et al.
SEF-Net: Speaker Embedding Free Target Speaker Extraction Network
Bang Zeng, Suo Hongbin, Yulong Wan et al.
Segmental features of Brazilian (Santa Catarina) Hunsrik
Dennis Hoffmann, Maria O'Reilly
Segmental SpeechCLIP: Utilizing Pretrained Image-text Models for Audio-Visual Learning
Saurabhchand Bhati, Jesús Villalba, Laureano Moro-Velazquez et al.
Selective Biasing with Trie-based Contextual Adapters for Personalised Speech Recognition using Neural Transducers
Philip Harding, Sibo Tong, Simon Wiesler
"Select language, modality or put on a mask!" Experiments with Multimodal Emotion Recognition
Paweł Bujnowski, Bartłomiej Kuźma, Bartłomiej Paziewski et al.
Self-Distillation into Self-Attention Heads for Improving Transformer-based End-to-End Neural Speaker Diarization
Ye-Rin Jeoung, Jeong-Hwan Choi, Ju-Seok Seong et al.
Self-FiLM: Conditioning GANs with self-supervised representations for bandwidth extension based speaker recognition
Saurabh Kataria, Jesús Villalba, Laureano Moro-Velazquez et al.
Self-Paced Pattern Augmentation for Spoken Term Detection in Zero-Resource
Sudhakar P, Sreenivasa K. Rao, Pabitra Mitra
Self-Supervised Acoustic Word Embedding Learning via Correspondence Transformer Encoder
Jingru Lin, Xianghu Yue, Junyi Ao et al.
Self-Supervised Dataset Pruning for Efficient Training in Audio Anti-spoofing
Abdul Hameed Azeemi, Ihsan Ayyub Qazi, Agha Ali Raza
Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering
Heng-Jui Chang, Alexander H. Liu, James Glass
Self-supervised Learning Representation based Accent Recognition with Persistent Accent Memory
Rui Li, Zhiwei Xie, Haihua Xu et al.
Self-supervised learning with Diffusion-based multichannel speech enhancement for speaker verification under noisy conditions
Sandipana Dowerah, Ajinkya Kulkarni, Romain Serizel et al.
Self-supervised Predictive Coding Models Encode Speaker and Phonetic Information in Orthogonal Subspaces
Oli Danyi Liu, Hao Tang, Sharon Goldwater
Self-Supervised Solution to the Control Problem of Articulatory Synthesis
Paul K. Krug, Peter Birkholz, Branislav Gerazov et al.
Semantic Enrichment Towards Efficient Speech Representations
Gaëlle Laperrière, Ha Nguyen, Sahar Ghannay et al.
Semantic Segmentation with Bidirectional Language Models Improves Long-form ASR
W. Ronny Huang, Hao Zhang, Shankar Kumar et al.
Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction
Mohan Shi, Yuchun Shu, Lingyun Zuo et al.
SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge
Jiaxu Zhu, Changhe Song, Zhiyong Wu et al.
Semi-supervised Learning for Continuous Emotional Intensity Controllable Speech Synthesis with Disentangled Representations
Yoori Oh, Juheon Lee, Yoseob Han et al.
Sentence Embedder Guided Utterance Encoder (SEGUE) for Spoken Language Understanding
Yi Xuan Tan, Navonil Majumder, Soujanya Poria
Sequence-Level Knowledge Distillation for Class-Incremental End-to-End Spoken Language Understanding
Umberto Cappellazzo, Muqiao Yang, Daniele Falavigna et al.