Papers
Spot Keywords From Very Noisy and Mixed Speech
Ying Shi, Dong Wang, Lantian Li et al.
SR-SRP: Super-Resolution based SRP-PHAT for Sound Source Localization and Tracking
Jae-Heung Cho, Joon-Hyuk Chang
Stable Speech Emotion Recognition with Head-k-Pooling Loss
Chaoyue Ding, Jiakui Li, Daoming Zong et al.
STE-GAN: Speech-to-Electromyography Signal Conversion using Generative Adversarial Networks
Kevin Scheck, Tanja Schultz
STEN-TTS: Improving Zero-shot Cross-Lingual Transfer for Multi-Lingual TTS with Style-Enhanced Normalization Diffusion Framework
Chung Tran, Chi Mai Luong, Sakriani Sakti
Stochastic Pitch Prediction Improves the Diversity and Naturalness of Speech in Glow-TTS
Sewade Ogun, Vincent Colotte, Emmanuel Vincent
Strategies for Improving Low Resource Speech to Text Translation Relying on Pre-trained ASR Models
Santosh Kesiraju, Marek Sarvaš, Tomáš Pavlíček et al.
Streaming Audio-Visual Speech Recognition with Alignment Regularization
Pingchuan Ma, Niko Moritz, Stavros Petridis et al.
Streaming Dual-Path Transformer for Speech Enhancement
Soo Hyun Bae, Seok Wan Chae, Youngseok Kim et al.
Streaming Parrotron for on-device speech-to-speech conversion
Oleg Rybakov, Fadi Biadsy, Xia Zhang et al.
Streaming Speech-to-Confusion Network Speech Recognition
Denis Filimonov, Prabhat Pandey, Ariya Rastrow et al.
Stuttering Detection Application
Kowshik Siva Sai Motepalli, Vamshiraghusimha Narasinga, Harsha Pathuri et al.
StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation
Kun Song, Yi Ren, Yi Lei et al.
Style-transfer based Speech and Audio-visual Scene understanding for Robot Action Sequence Acquisition from Videos
Chiori Hori, Puyuan Peng, David Harwath et al.
Supervised Contrastive Learning with Nearest Neighbor Search for Speech Emotion Recognition
Xuechen Wang, Shiwan Zhao, Yong Qin
Svarah: Evaluating English ASR Systems on Indian Accents
Tahir Javed, Sakshi Joshi, Vignesh Nagarajan et al.
SVVAD: Personal Voice Activity Detection for Speaker Verification
Zuheng Kang, Jianzong Wang, Junqing Peng et al.
SWRR: Feature Map Classifier Based on Sliding Window Attention and High-Response Feature Reuse for Multimodal Emotion Recognition
Ziping Zhao, Tian Gao, Haishuai Wang et al.
Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model
Puyuan Peng, Shang-Wen Li, Okko Räsänen et al.
Synthesis after a couple PINTs: Investigating the Role of Pause-Internal Phonetic Particles in Speech Synthesis and Perception
Mikey Elmers, Johannah O'Mahony, Éva Székely
Synthetic Voice Spoofing Detection based on Feature Pyramid Conformer
Jingran Gong, Ning Chen
Tailored Real-Time Call Summarization System for Contact Centers
Aashraya Sachdeva, Sai Nishanth Padala, Anup Pattnaik et al.
Take the Hint: Improving Arabic Diacritization with Partially-Diacritized Text
Parnia Bahar, Mattia Di Gangi, Nick Rossenbach et al.
Target Active Speaker Detection with Audio-visual Cues
Yidi Jiang, Ruijie Tao, Zexu Pan et al.
Target Speech Extraction with Conditional Diffusion Model
Naoyuki Kamo, Marc Delcroix, Tomohiro Nakatani