Research Explorer

GSQA: An End-to-End Model for Generative Spoken Question Answering

Min-Han Shih, Ho-Lam Chung, Yu-Chi Pai et al.

2024 INTERSPEECH

GTR-Voice: Articulatory Phonetics Informed Controllable Expressive Speech Synthesis

Zehua Kcriss Li, Meiying Melissa Chen, Yi Zhong et al.

2024 INTERSPEECH

Guided conditioning with predictive network on score-based diffusion model for speech enhancement

Dail Kim, Da-Hee Yang, Donghyun Kim et al.

2024 INTERSPEECH

Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation

Eungbeom Kim, Hantae Kim, Kyogu Lee

2024 INTERSPEECH

H4C-TTS: Leveraging Multi-Modal Historical Context for Conversational Text-to-Speech

Donghyun Seong, Joon-Hyuk Chang

2024 INTERSPEECH

Harder or Different? Understanding Generalization of Audio Deepfake Detection

Nicolas M. Müller, Nicholas Evans, Hemlata Tak et al.

2024 INTERSPEECH

HarmoNet: Partial DeepFake Detection Network based on Multi-scale HarmoF0 Feature Fusion

Liwei Liu, Huihui Wei, Dongya Liu et al.

2024 INTERSPEECH

Hear Your Face: Face-based voice conversion with F0 estimation

Jaejun Lee, Yoori Oh, Injune Hwang et al.

2024 INTERSPEECH

HebDB: a Weakly Supervised Dataset for Hebrew Speech Processing

Arnon Turetzky, Or Tal, Yael Segal et al.

2024 INTERSPEECH

Hierarchical Distribution Adaptation for Unsupervised Cross-corpus Speech Emotion Recognition

Cheng Lu, Yuan Zong, Yan Zhao et al.

2024 INTERSPEECH

Hierarchical Multi-Task Learning with CTC and Recursive Operation

Nahomi Kusunoki, Yosuke Higuchi, Tetsuji Ogawa et al.

2024 INTERSPEECH

High Fidelity Text-to-Speech Via Discrete Tokens Using Token Transducer and Group Masked Language Model

Joun Yeop Lee, Myeonghun Jeong, Minchan Kim et al.

2024 INTERSPEECH

Highly Intelligible Speaker-Independent Articulatory Synthesis

Charles McGhee, Kate Knill, Mark Gales

2024 INTERSPEECH

Hold Me Tight: Stable Encoder-Decoder Design for Speech Enhancement

Daniel Haider, Felix Perfler, Vincent Lostanlen et al.

2024 INTERSPEECH

Homograph Disambiguation with Text-to-Text Transfer Transformer

Markéta Řezáčková, Daniel Tihelka, Jindřich Matoušek

2024 INTERSPEECH

How Consistent are Speech-Based Biomarkers in Remote Tracking of ALS Disease Progression Across Languages? A Case Study of English and Dutch

Hardik Kothare, Michael Neumann, Cathy Zhang et al.

2024 INTERSPEECH

How Does Alignment Error Affect Automated Pronunciation Scoring in Children's Speech?

Prad Kadambi, Tristan Mahr, Lucas Annear et al.

2024 INTERSPEECH

How Do Neural Spoofing Countermeasures Detect Partially Spoofed Audio?

Tianchi Liu, Lin Zhang, Rohan Kumar Das et al.

2024 INTERSPEECH

How Much Context Does My Attention-Based ASR System Need?

Robert Flynn, Anton Ragni

2024 INTERSPEECH

How Private is Low-Frequency Speech Audio in the Wild? An Analysis of Verbal Intelligibility by Humans and Machines

Ailin Liu, Pepijn Vunderink, Jose Vargas Quiros et al.

2024 INTERSPEECH

How rhythm metrics are linked to produced and perceived speaker charisma

Oliver Niebuhr, Nafiseh Taghva

2024 INTERSPEECH

How Should We Extract Discrete Audio Tokens from Self-Supervised Models?

Pooneh Mousavi, Jarod Duret, Salah Zaiem et al.

2024 INTERSPEECH

HuBERT-EE: Early Exiting HuBERT for Efficient Speech Recognition

Ji Won Yoon, Beom Jun Woo, Nam Soo Kim

2024 INTERSPEECH

Human-like Linguistic Biases in Neural Speech Models: Phonetic Categorization and Phonotactic Constraints in Wav2Vec2.0

Marianne de Heer Kloots, Willem Zuidema

2024 INTERSPEECH

Hybrid-Diarization System with Overlap Post-Processing for the DISPLACE 2024 Challenge

Gabriel Pîrlogeanu, Octavian Pascu, Alexandru-Lucian Georgescu et al.

2024 INTERSPEECH

Papers