Papers
8,761 papers found
Automatic Speech Recognition Transformer with Global Contextual Information Decoder
Yukun Qian, Xuyi Zhuang, Mingjiang Wang
Automatic Tuning of Loss Trade-offs without Hyper-parameter Search in End-to-End Zero-Shot Speech Synthesis
Seongyeon Park, Bohyung Kim, Tae-Hyun Oh
Average Token Delay: A Latency Metric for Simultaneous Translation
Yasumasa Kano, Katsuhito Sudoh, Satoshi Nakamura
BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language models
Marvin Lavechin, Yaya Sy, Hadrien Titeux et al.
Background-aware Modeling for Weakly Supervised Sound Event Detection
Yifei Xin, Dongchao Yang, Yuexian Zou
Background Domain Switch: A Novel Data Augmentation Technique for Robust Sound Event Detection
Wei-Cheng Lin, Luca Bondi, Shabnam Ghaffarzadegan
Background-Sound Controllable Voice Source Separation
Deokjun Eom, Woo Hyun Nam, Kyung-Rae Kim
BASEN: Time-Domain Brain-Assisted Speech Enhancement Network with Convolutional Cross Attention in Multi-talker Conditions
Jie Zhang, QingTian Xu, Qiu-Shi Zhu et al.
BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR
Yuhao Liang, Fan Yu, Yangze Li et al.
BASS: Block-wise Adaptation for Speech Summarization
Roshan Sharma, Siddhant Arora, Kenneth Zheng et al.
BAT: Boundary aware transducer for memory-efficient and low-latency ASR
Keyu An, Xian Shi, Shiliang Zhang
Bayesian Networks for the robust and unbiased prediction of depression and its symptoms utilizing speech and multimodal data
Salvatore Fara, Orlaith Hickey, Alexandra Georgescu et al.
Bayes Risk Transducer: Transducer with Controllable Alignment Prediction
Jinchuan Tian, Jianwei Yu, Hangting Chen et al.
Beatboxing Kick Drum Kinematics
Reed Blaylock, Shrikanth Narayanan
BeAts: Bengali Speech Acts Recognition using Multimodal Attention Fusion
Ahana Deb, Sayan Nag, Ayan Mahapatra et al.
Behavioral Analysis of Pathological Speaker Embeddings of Patients During Oncological Treatment of Oral Cancer
Jenthe Thienpondt, Caroline M. Speksnijder, Kris Demuynck
Betray Oneself: A Novel Audio DeepFake Detection Model via Mono-to-Stereo Conversion
Rui Liu, Jinhua Zhang, Guanglai Gao et al.
Beyond Style: Synthesizing Speech with Pragmatic Functions
Harm Lameris, Joakim Gustafson, Éva Székely
Biased Self-supervised Learning for ASR
Florian L. Kreyssig, Yangyang Shi, Jinxi Guo et al.
Binaural Sound Localization in Noisy Environments Using Frequency-Based Audio Vision Transformer (FAViT)
Waradon Phokhinanan, Nicolas Obin, Sylvain Argentieri
Biophysically-inspired single-channel speech enhancement in the time domain
Chuan Wen, Sarah Verhulst
Blank Collapse: Compressing CTC Emission for the Faster Decoding
Minkyu Jung, Ohhyeok Kwon, Seunghyun Seo et al.
Blank-regularized CTC for Frame Skipping in Neural Transducer
Yifan Yang, Xiaoyu Yang, Liyong Guo et al.
Blind Estimation of Room Impulse Response from Monaural Reverberant Speech with Segmental Generative Neural Network
Zhiheng Liao, Feifei Xiong, Juan Luo et al.