Papers

8,761 papers found

Automatic Speech Recognition Transformer with Global Contextual Information Decoder

Yukun Qian, Xuyi Zhuang, Mingjiang Wang

2023 INTERSPEECH

Automatic Tuning of Loss Trade-offs without Hyper-parameter Search in End-to-End Zero-Shot Speech Synthesis

Seongyeon Park, Bohyung Kim, Tae-Hyun Oh

2023 INTERSPEECH

Average Token Delay: A Latency Metric for Simultaneous Translation

Yasumasa Kano, Katsuhito Sudoh, Satoshi Nakamura

2023 INTERSPEECH

BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language models

Marvin Lavechin, Yaya Sy, Hadrien Titeux et al.

2023 INTERSPEECH

Background-aware Modeling for Weakly Supervised Sound Event Detection

Yifei Xin, Dongchao Yang, Yuexian Zou

2023 INTERSPEECH

Background Domain Switch: A Novel Data Augmentation Technique for Robust Sound Event Detection

Wei-Cheng Lin, Luca Bondi, Shabnam Ghaffarzadegan

2023 INTERSPEECH

Background-Sound Controllable Voice Source Separation

Deokjun Eom, Woo Hyun Nam, Kyung-Rae Kim

2023 INTERSPEECH

BASEN: Time-Domain Brain-Assisted Speech Enhancement Network with Convolutional Cross Attention in Multi-talker Conditions

Jie Zhang, QingTian Xu, Qiu-Shi Zhu et al.

2023 INTERSPEECH

BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR

Yuhao Liang, Fan Yu, Yangze Li et al.

2023 INTERSPEECH

BASS: Block-wise Adaptation for Speech Summarization

Roshan Sharma, Siddhant Arora, Kenneth Zheng et al.

2023 INTERSPEECH

BAT: Boundary aware transducer for memory-efficient and low-latency ASR

Keyu An, Xian Shi, Shiliang Zhang

2023 INTERSPEECH

Bayesian Networks for the robust and unbiased prediction of depression and its symptoms utilizing speech and multimodal data

Salvatore Fara, Orlaith Hickey, Alexandra Georgescu et al.

2023 INTERSPEECH

Bayes Risk Transducer: Transducer with Controllable Alignment Prediction

Jinchuan Tian, Jianwei Yu, Hangting Chen et al.

2023 INTERSPEECH

Beatboxing Kick Drum Kinematics

Reed Blaylock, Shrikanth Narayanan

2023 INTERSPEECH

BeAts: Bengali Speech Acts Recognition using Multimodal Attention Fusion

Ahana Deb, Sayan Nag, Ayan Mahapatra et al.

2023 INTERSPEECH

Behavioral Analysis of Pathological Speaker Embeddings of Patients During Oncological Treatment of Oral Cancer

Jenthe Thienpondt, Caroline M. Speksnijder, Kris Demuynck

2023 INTERSPEECH

Betray Oneself: A Novel Audio DeepFake Detection Model via Mono-to-Stereo Conversion

Rui Liu, Jinhua Zhang, Guanglai Gao et al.

2023 INTERSPEECH

Beyond Style: Synthesizing Speech with Pragmatic Functions

Harm Lameris, Joakim Gustafson, Éva Székely

2023 INTERSPEECH

Beyond the AI hype: Balancing Innovation and Social Responsibility

Virginia Dignum

2023 INTERSPEECH

Biased Self-supervised Learning for ASR

Florian L. Kreyssig, Yangyang Shi, Jinxi Guo et al.

2023 INTERSPEECH

Binaural Sound Localization in Noisy Environments Using Frequency-Based Audio Vision Transformer (FAViT)

Waradon Phokhinanan, Nicolas Obin, Sylvain Argentieri

2023 INTERSPEECH

Biophysically-inspired single-channel speech enhancement in the time domain

Chuan Wen, Sarah Verhulst

2023 INTERSPEECH

Blank Collapse: Compressing CTC Emission for the Faster Decoding

Minkyu Jung, Ohhyeok Kwon, Seunghyun Seo et al.

2023 INTERSPEECH

Blank-regularized CTC for Frame Skipping in Neural Transducer

Yifan Yang, Xiaoyu Yang, Liyong Guo et al.

2023 INTERSPEECH

Blind Estimation of Room Impulse Response from Monaural Reverberant Speech with Segmental Generative Neural Network

Zhiheng Liao, Feifei Xiong, Juan Luo et al.

2023 INTERSPEECH