Papers
Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus
Detai Xin, Shinnosuke Takamichi, Ai Morimatsu et al.
Learning A Self-Supervised Domain-Invariant Feature Representation for Generalized Audio Deepfake Detection
Yuankun Xie, Haonan Cheng, Yutian Wang et al.
Learning Cross-lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition
Muhammad Umar Farooq, Thomas Hain
Learning Emotional Representations from Imbalanced Speech Data for Speech Emotion Recognition and Emotional Text-to-Speech
Shijun Wang, Jón Guðnason, Damian Borth
Learning Local to Global Feature Aggregation for Speech Emotion Recognition
Cheng Lu, Hailun Lian, Wenming Zheng et al.
Learning to Compute the Articulatory Representations of Speech with the MIRRORNET
Yashish M Siriwardena, Carol Espy-Wilson, Shihab Shamma
Learning When to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models
Liam Dugan, Anshul Wadhawan, Kyle Spence et al.
Learning When to Trust Which Teacher for Weakly Supervised ASR
Aakriti Agrawal, Milind Rao, Anit Kumar Sahu et al.
Let's Give a Voice to Conversational Agents in Virtual Reality
Michele Yin, Gabriel Roccabruna, Abhinav Azad et al.
Leveraging Cross-Utterance Context For ASR Decoding
Robert Flynn, Anton Ragni
Leveraging Label Information for Multimodal Emotion Recognition
Peiying Wang, Sunlu Zeng, Junqing Chen et al.
Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling
He Huang, Jagadeesh Balam, Boris Ginsburg
Leveraging Semantic Information for Efficient Self-Supervised Emotion Recognition with Audio-Textual Distilled Models
Danilo de Oliveira, Navin Raj Prabhu, Timo Gerkmann
Lexical Speaker Error Correction: Leveraging Language Models for Speaker Diarization Error Correction
Rohit Paturi, Sundararajan Srinivasan, Xiang Li
Lexical Stress and Velar Palatalization in Italian: A spatio-temporal Interaction
Bowei Shao, Philipp Buech, Anne Hermes et al.
LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus
Yuma Koizumi, Heiga Zen, Shigeki Karita et al.
LightClone: Speaker-guided Parallel Subnet Selection for Few-shot Voice Cloning
Jie Wu, Jian Luan, Yujun Wang
LightVoc: An Upsampling-Free GAN Vocoder Based On Conformer And Inverse Short-time Fourier Transform
Dinh Son Dang, Tung Lam Nguyen, Bao Thang Ta et al.
Lightweight and Efficient Spoken Language Identification of Long-form Audio
Winstead Zhu, Md Iftekhar Tanveer, Yang Janet Liu et al.
Listener sensitivity to deviating obstruents in WaveNet
Ayushi Pandey, Jens Edlund, Sébastien Le Maguer et al.
Listening To Silences In Contact Center Conversations Using Textual Cues
Digvijay Anil Ingle, Ayush Kumar, Jithendra Vepa
Locate and Beamform: Two-dimensional Locating All-neural Beamformer for Multi-channel Speech Separation
Yanjie Fu, Meng Ge, Honglong Wang et al.
Lossless 4-bit Quantization of Architecture Compressed Conformer ASR Systems on the 300-hr Switchboard Corpus
Zhaoqing Li, Tianzi Wang, Jiajun Deng et al.
Low-complexity Broadband Beampattern Synthesis using Array Response Control
Jiayi Xu, Jian Li, Weixin Meng et al.
Low-Resource Cross-Lingual Adaptive Training for Nigerian Pidgin
Pin-Jie Lin, Muhammed Saeed, Ernie Chang et al.