Papers
8,761 papers found
Sign Value Constraint Decomposition for Efficient 1-Bit Quantization of Speech Translation Tasks
Nan Chen, Yonghe Wang, Feilong Bao
SilentCipher: Deep Audio Watermarking
Mayank Kumar Singh, Naoya Takahashi, Weihsiang Liao et al.
SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models
Dongchao Yang, Dingdong Wang, Haohan Guo et al.
Simulating articulatory trajectories with phonological feature interpolation
Angelo Ortiz Tandazo, Thomas Schatz, Thomas Hueber et al.
Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection
Haoyu Wang, Guoqiang Hu, Guodong Lin et al.
SimuSOE: A Simulated Snoring Dataset for Obstructive Sleep Apnea-Hypopnea Syndrome Evaluation during Wakefulness
Jie Lin, Xiuping Yang, Li Xiao et al.
Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and ACE-KiSing
Jiatong Shi, Yueqian Lin, Xinyi Bai et al.
Singing Voice Graph Modeling for SingFake Detection
Xuanjun Chen, Haibin Wu, Roger Jang et al.
Single-Codec: Single-Codebook Speech Codec towards High-Performance Speech Generation
Hanzhao Li, Liumeng Xue, Haohan Guo et al.
SingOMD: Singing Oriented Multi-resolution Discrete Representation Construction from Speech Models
Yuxun Tang, Yuning Wu, Jiatong Shi et al.
Small-E: Small Language Model with Linear Attention for Efficient Speech Synthesis
Théodor Lemerle, Nicolas Obin, Axel Roebel
Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation
Peidong Wang, Jian Xue, Jinyu Li et al.
SOMSRED: Sequential Output Modeling for Joint Multi-talker Overlapped Speech Recognition and Speaker Diarization
Naoki Makishima, Naotaka Kawata, Mana Ihori et al.
“So . . . my child . . . ” – How Child ADHD Influences the Way Parents Talk
Anika A. Spiesberger, Andreas Triantafyllopoulos, Alexander Kathan et al.
Song Data Cleansing for End-to-End Neural Singer Diarization Using Neural Analysis and Synthesis Framework
Hokuto Munakata, Ryo Terashima, Yusuke Fujita
SOT Triggered Neural Clustering for Speaker Attributed ASR
Xianrui Zheng, Guangzhi Sun, Chao Zhang et al.
Sound Event Bounding Boxes
Janek Ebbers, François G. Germain, Gordon Wichern et al.
Sound of Traffic: A Dataset for Acoustic Traffic Identification and Counting
Shabnam Ghaffarzadegan, Luca Bondi, Wei-Chang Lin et al.
Sound of Vision: Audio Generation from Visual Text Embedding through Training Domain Discriminator
Jaewon Kim, Won-Gook Choi, Seyun Ahn et al.
Source Tracing of Audio Deepfake Systems
Nicholas Klein, Tianxiang Chen, Hemlata Tak et al.
Sparse Binarization for Fast Keyword Spotting
Jonathan Svirsky, Uri Shaham, Ofir Lindenbaum
SparseWAV: Fast and Accurate One-Shot Unstructured Pruning for Large Speech Foundation Models
Tianteng Gu, Bei Liu, Hang Shao et al.
SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion
Bingsong Bai, Fengping Wang, Yingming Gao et al.
Spatial Acoustic Enhancement Using Unbiased Relative Harmonic Coefficients
Liang Tao, Maoshen Jia, Yonggang Hu et al.
Spatial Voice Conversion: Voice Conversion Preserving Spatial Information and Non-target Signals
Kentaro Seki, Shinnosuke Takamichi, Norihiro Takamune et al.