Papers
8,761 papers found
GSQA: An End-to-End Model for Generative Spoken Question Answering
Min-Han Shih, Ho-Lam Chung, Yu-Chi Pai et al.
GTR-Voice: Articulatory Phonetics Informed Controllable Expressive Speech Synthesis
Zehua Kcriss Li, Meiying Melissa Chen, Yi Zhong et al.
Guided conditioning with predictive network on score-based diffusion model for speech enhancement
Dail Kim, Da-Hee Yang, Donghyun Kim et al.
Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation
Eungbeom Kim, Hantae Kim, Kyogu Lee
H4C-TTS: Leveraging Multi-Modal Historical Context for Conversational Text-to-Speech
Donghyun Seong, Joon-Hyuk Chang
Harder or Different? Understanding Generalization of Audio Deepfake Detection
Nicolas M. Müller, Nicholas Evans, Hemlata Tak et al.
HarmoNet: Partial DeepFake Detection Network based on Multi-scale HarmoF0 Feature Fusion
Liwei Liu, Huihui Wei, Dongya Liu et al.
Hear Your Face: Face-based voice conversion with F0 estimation
Jaejun Lee, Yoori Oh, Injune Hwang et al.
HebDB: a Weakly Supervised Dataset for Hebrew Speech Processing
Arnon Turetzky, Or Tal, Yael Segal et al.
Hierarchical Distribution Adaptation for Unsupervised Cross-corpus Speech Emotion Recognition
Cheng Lu, Yuan Zong, Yan Zhao et al.
Hierarchical Multi-Task Learning with CTC and Recursive Operation
Nahomi Kusunoki, Yosuke Higuchi, Tetsuji Ogawa et al.
High Fidelity Text-to-Speech Via Discrete Tokens Using Token Transducer and Group Masked Language Model
Joun Yeop Lee, Myeonghun Jeong, Minchan Kim et al.
Highly Intelligible Speaker-Independent Articulatory Synthesis
Charles McGhee, Kate Knill, Mark Gales
Hold Me Tight: Stable Encoder-Decoder Design for Speech Enhancement
Daniel Haider, Felix Perfler, Vincent Lostanlen et al.
Homograph Disambiguation with Text-to-Text Transfer Transformer
Markéta Řezáčková, Daniel Tihelka, Jindřich Matoušek
How Consistent are Speech-Based Biomarkers in Remote Tracking of ALS Disease Progression Across Languages? A Case Study of English and Dutch
Hardik Kothare, Michael Neumann, Cathy Zhang et al.
How Does Alignment Error Affect Automated Pronunciation Scoring in Children's Speech?
Prad Kadambi, Tristan Mahr, Lucas Annear et al.
How Do Neural Spoofing Countermeasures Detect Partially Spoofed Audio?
Tianchi Liu, Lin Zhang, Rohan Kumar Das et al.
How Much Context Does My Attention-Based ASR System Need?
Robert Flynn, Anton Ragni
How Private is Low-Frequency Speech Audio in the Wild? An Analysis of Verbal Intelligibility by Humans and Machines
Ailin Liu, Pepijn Vunderink, Jose Vargas Quiros et al.
How rhythm metrics are linked to produced and perceived speaker charisma
Oliver Niebuhr, Nafiseh Taghva
How Should We Extract Discrete Audio Tokens from Self-Supervised Models?
Pooneh Mousavi, Jarod Duret, Salah Zaiem et al.
HuBERT-EE: Early Exiting HuBERT for Efficient Speech Recognition
Ji Won Yoon, Beom Jun Woo, Nam Soo Kim
Human-like Linguistic Biases in Neural Speech Models: Phonetic Categorization and Phonotactic Constraints in Wav2Vec2.0
Marianne de Heer Kloots, Willem Zuidema
Hybrid-Diarization System with Overlap Post-Processing for the DISPLACE 2024 Challenge
Gabriel Pîrlogeanu, Octavian Pascu, Alexandru-Lucian Georgescu et al.