Papers
8,761 papers found
Linear-Complexity Self-Supervised Learning for Speech Processing
Shucong Zhang, Titouan Parcollet, Rogier van Dalen et al.
LingWav2Vec2: Linguistic-augmented wav2vec 2.0 for Vietnamese Mispronunciation Detection
Tuan Nguyen, Huy Dat Tran
LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
Sreyan Ghosh, Sonal Kumar, Ashish Seth et al.
Listeners' F0 preferences in quiet and stationary noise
Olympia Simantiraki, Martin Cooke
LiteFocus: Accelerated Diffusion Inference for Long Audio Synthesis
Zhenxiong Tan, Xinyin Ma, Gongfan Fang et al.
LI-TTA: Language Informed Test-Time Adaptation for Automatic Speech Recognition
Eunseop Yoon, Hee Suk Yoon, John Harvill et al.
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
Trung Dang, David Aponte, Dung Tran et al.
LLM-Driven Multimodal Opinion Expression Identification
Bonian Jia, Huiyao Chen, Yueheng Sun et al.
Locally Aligned Rectified Flow Model for Speech Enhancement Towards Single-Step Diffusion
Zhengxiao Li, Nakamasa Inoue
LoRA-MER: Low-Rank Adaptation of Pre-Trained Speech Models for Multimodal Emotion Recognition Using Mutual Information
Yunrui Cai, Zhiyong Wu, Jia Jia et al.
LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR
Zheshu Song, Jianheng Zhuo, Yifan Yang et al.
Low Bitrate High-Quality RVQGAN-based Discrete Speech Tokenizer
Slava Shechtman, Avihu Dekel
Low-Complexity Acoustic Scene Classification Using Parallel Attention-Convolution Network
Yanxiong Li, Jiaxin Tan, Guoqing Chen et al.
Low-dimensional Style Token Control for Hyperarticulated Speech Synthesis
Miku Nishihara, Dan Wells, Korin Richmond et al.
LungAdapter: Efficient Adapting Audio Spectrogram Transformer for Lung Sound Classification
Li Xiao, Lucheng Fang, Yuhong Yang et al.
LUPET: Incorporating Hierarchical Information Path into Multilingual ASR
Wei Liu, Jingyong Hou, Dong Yang et al.
M2ASR: Multilingual Multi-task Automatic Speech Recognition via Multi-objective Optimization
A F M Saif, Lisha Chen, Xiaodong Cui et al.
M2D-CLAP: Masked Modeling Duo Meets CLAP for Learning General-purpose Audio-Language Representation
Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi et al.
Macro-descriptors for Alzheimer's disease detection using large language models
Catarina Botelho, John Mendonça, Anna Pompili et al.
Magnitude and timing of acceleration peaks in stressed and unstressed syllables
Malin Svensson Lundmark
MakeSinger: A Semi-Supervised Training Method for Data-Efficient Singing Voice Synthesis via Classifier-free Diffusion Guidance
Semin Kim, Myeonghun Jeong, Hyeonseung Lee et al.
MaLa-ASR: Multimedia-Assisted LLM-Based ASR
Guanrou Yang, Ziyang Ma, Fan Yu et al.
MaskSR: Masked Language Model for Full-band Speech Restoration
Xu Li, Qirui Wang, Xiaoyu Liu