Papers
Hearing Loss Affects Emotion Perception in Older Adults: Evidence from a Prosody-Semantics Stroop Task
Yingyang Wang, Min Xu, Jing Shao et al.
Hierarchical Timbre-Cadence Speaker Encoder for Zero-shot Speech Synthesis
Joun Yeop Lee, Jae-Sung Bae, Seongkyu Mun et al.
HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer
Sang-Hoon Lee, Ha-Yeong Choi, Hyung-Seok Oh et al.
High Fidelity Speech Enhancement with Band-split RNN
Jianwei Yu, Hangting Chen, Yi Luo et al.
High-Quality Automatic Voice Over with Accurate Alignment: Supervision through Self-Supervised Discrete Speech Units
Junchen Lu, Berrak Sisman, Mingyang Zhang et al.
HK-LegiCoST: Leveraging Non-Verbatim Transcripts for Speech Translation
Cihan Xiao, Henry Li Xinyuan, Jinyi Yang et al.
How ChatGPT is Robust for Spoken Language Understanding?
Guangpeng Li, Lu Chen, Kai Yu
How Does Pretraining Improve Discourse-Aware Translation?
Zhihong Huang, Longyue Wang, Siyou Liu et al.
How Generative Spoken Language Modeling Encodes Noisy Speech: Investigation from Phonetics to Syntactics
Joonyong Park, Shinnosuke Takamichi, Tomohiko Nakamura et al.
How to Construct Perfect and Worse-than-Coin-Flip Spoofing Countermeasures: A Word of Warning on Shortcut Learning
Hye-jin Shim, Rosa Gonzalez Hautamäki, Md Sahidullah et al.
How to Estimate Model Transferability of Pre-Trained Speech Models?
Zih-Ching Chen, Chao-Han Huck Yang, Bo Li et al.
How to (Virtually) Train Your Speaker Localizer
Prerak Srivastava, Antoine Deleforge, Archontis Politis et al.
HumanDiffusion: diffusion model using perceptual gradients
Yota Ueda, Shinnosuke Takamichi, Yuki Saito et al.
Human Transcription Quality Improvement
Jian Gao, Hanbo Sun, Cheng Cao et al.
Hybrid AHS: A Hybrid of Kalman Filter and Deep Learning for Acoustic Howling Suppression
Hao Zhang, Meng Yu, Yuzhong Wu et al.
Hybrid Dataset for Speech Emotion Recognition in Russian Language
Vladimir Kondratenko, Nikolay Karpov, Artem Sokolov et al.
Hybrid Silent Speech Interface Through Fusion of Electroencephalography and Electromyography
Huiyan Li, Mingyi Wang, Han Gao et al.
HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition
Florian Mai, Juan Zuluaga-Gomez, Titouan Parcollet et al.
Hyper-parameter Adaptation of Conformer ASR Systems for Elderly and Dysarthric Speech Recognition
Tianzi Wang, Shoukang Hu, Jiajun Deng et al.
I Learned Error, I Can Fix It! : A Detector-Corrector Structure for ASR Error Calibration
Heui-Yeen Yeen, Min-Ju Kim, Myoung-Wan Koo
Image-driven Audio-visual Universal Source Separation
Chenxing Li, Ye Bai, Yang Wang et al.
Impact of Residual Noise and Artifacts in Speech Enhancement Errors on Intelligibility of Human and Machine
Shoko Araki, Ayako Yamamoto, Tsubasa Ochiai et al.
Implementing Contextual Biasing in GPU Decoder for Online ASR
Iuliia Nigmatulina, Srikanth Madikeri, Esaú Villatoro-Tello et al.
Implicit phonetic information modeling for speech emotion recognition
Tilak Purohit, Bogdan Vlasenko, Mathew Magimai.-Doss