Papers
Seq-2-Seq based Refinement of ASR Output for Spoken Name Capture
Karan Singla, Shahab Jalalvand, Yeon-Jun Kim et al.
SF-DST: Few-Shot Self-Feeding Reading Comprehension Dialogue State Tracking with Auxiliary Task
Jihyun Lee, Gary Geunbae Lee
Shallow Fusion of Weighted Finite-State Transducer and Language Model for Text Normalization
Evelina Bakhturina, Yang Zhang, Boris Ginsburg
SHAS: Approaching optimal Segmentation for End-to-End Speech Translation
Ioannis Tsiamas, Gerard I. Gállego, José A. R. Fonollosa et al.
SiDi KWS: A Large-Scale Multilingual Dataset for Keyword Spotting
Michel Cardoso Meneses, Rafael Bérgamo Holanda, Luis Vasconcelos Peres et al.
SiD-WaveFlow: A Low-Resource Vocoder Independent of Prior Knowledge
Yuhan Li, Ying Shen, Dongqing Wang et al.
Significance of single frequency filter for the development of children’s KWS system
Biswaranjan Pattanayak, Gayadhar Pradhan
Similarity and Content-based Phonetic Self Attention for Speech Recognition
Kyuhong Shim, Wonyong Sung
Simple and Effective Multi-sentence TTS with Expressive and Coherent Prosody
Peter Makarov, Syed Ammar Abbas, Mateusz Łajszczak et al.
Simple and Effective Unsupervised Speech Synthesis
Alexander H. Liu, Cheng-I Lai, Wei-Ning Hsu et al.
Simple and Effective Zero-shot Cross-lingual Phoneme Recognition
Qiantong Xu, Alexei Baevski, Michael Auli
SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy
Shuai Guo, Jiatong Shi, Tao Qian et al.
Single-channel speech enhancement using Graph Fourier Transform
Chenhui Zhang, Xiang Pan
SKYE: More than a conversational AI
Alzahra Badi, Chungho Park, Minseok Keum et al.
Small Changes Make Big Differences: Improving Multi-turn Response Selection in Dialogue Systems via Fine-Grained Contrastive Learning
Yuntao Li, Can Xu, Huang Hu et al.
Small Footprint Neural Networks for Acoustic Direction of Arrival Estimation
Zhiheng Ouyang, Miao Wang, Wei-Ping Zhu
SNRi Target Training for Joint Speech Enhancement and Recognition
Yuma Koizumi, Shigeki Karita, Arun Narayanan et al.
Soft-label Learn for No-Intrusive Speech Quality Assessment
Junyong Hao, Shunzhou Ye, Cheng Lu et al.
SoftSpeech: Unsupervised Duration Model in FastSpeech 2
Yuan-Hao Yi, Lei He, Shifeng Pan et al.
SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis
Georgia Maniati, Alexandra Vioni, Nikolaos Ellinas et al.
SoundChoice: Grapheme-to-Phoneme Models with Semantic Disambiguation
Artem Ploujnikov, Mirco Ravanelli
SoundDoA: Learn Sound Source Direction of Arrival and Semantics from Sound Raw Waveforms
YUHANG HE, Andrew Markham
Space-Efficient Representation of Entity-centric Query Language Models
Christophe Van Gysel, Mirko Hannemann, Ernest Pusateri et al.
Span Classification with Structured Information for Disfluency Detection in Spoken Utterances
Sreyan Ghosh, Sonal Kumar, Yaman Kumar et al.
Spatial-aware Speaker Diarizaiton for Multi-channel Multi-party Meeting
Jie Wang, Yuji Liu, Binling Wang et al.