Papers
8,761 papers found
FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter
Yuanjun Lv, Hai Li, Ying Yan et al.
Frequency-mix Knowledge Distillation for Fake Speech Detection
Cunhang Fan, Shunbo Dong, Jun Xue et al.
Frication noise features of Polish voiceless dental fricative and affricate produced by children with and without speech disorder
Zuzanna Miodonska, Michal Kręcichwost, Ewa Kwaśniok et al.
From Sound to Meaning in the Auditory Cortex: A Neuronal Representation and Classification Analysis
Kumar Neelabh, Vishnu Sreekumar
From Text to Emotion: Unveiling the Emotion Annotation Capabilities of LLMs
Minxue Niu, Mimansa Jaiswal, Emily Mower Provost
Fully Few-shot Class-incremental Audio Classification Using Expandable Dual-embedding Extractor
Yongjie Si, Yanxiong Li, Jialong Li et al.
FVTTS : Face Based Voice Synthesis for Text-to-Speech
Minyoung Lee, Eunil Park, Sungeun Hong
G2PA: G2P with Aligned Audio for Mandarin Chinese
Xingxing Yang
Gender and age based f0-variation in the German Plapper Corpus
Melanie Weirich, Daniel Duran, Stefanie Jannedy
Gender and Language Identification in Multilingual Models of Speech: Exploring the Genericity and Robustness of Speech Representations
Séverine Guillaume, Maxime Fily, Alexis Michaud et al.
Gender Representation in TV and Radio: Automatic Information Extraction methods versus Manual Analyses
David Doukhan, Lena Dodson, Manon Conan et al.
GenDistiller: Distilling Pre-trained Language Models based on an Autoregressive Generative Model
Yingying Gao, Shilei Zhang, Chao Deng et al.
Generalized Fake Audio Detection via Deep Stable Learning
Zhiyong Wang, Ruibo Fu, Zhengqi Wen et al.
Generalized Source Tracing: Detecting Novel Audio Deepfake Algorithm with Real Emphasis and Fake Dispersion Strategy
Yuankun Xie, Ruibo Fu, Zhengqi Wen et al.
Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems
Zhengyang Chen, Xuechen Liu, Erica Cooper et al.
Genhancer: High-Fidelity Speech Enhancement via Generative Modeling on Discrete Codec Tokens
Haici Yang, Jiaqi Su, Minje Kim et al.
Genuine-Focused Learning using Mask AutoEncoder for Generalized Fake Audio Detection
Xiaopeng Wang, Ruibo Fu, Zhengqi Wen et al.
Getting More for Less: Using Weak Labels and AV-Mixup for Robust Audio-Visual Speaker Verification
Anith Selvakumar, Homa Fashandi
Global-Local Convolution with Spiking Neural Networks for Energy-efficient Keyword Spotting
Shuai Wang, Dehao Zhang, Kexin Shi et al.
GLOBE: A High-quality English Corpus with Global Accents for Zero-shot Speaker Adaptive Text-to-Speech
Wenbin Wang, Yang Song, Sanjay Jha
Glottal inverse filtering and vocal tract tuning for the numerical simulation of vowel /a/ with different levels of vocal effort
Marc Freixes, Marc Arnela, Joan Claudi Socoró et al.
GPA: Global and Prototype Alignment for Audio-Text Retrieval
Yuxin Xie, Zhihong Zhu, Xianwei Zhuang et al.
Graph Attention Based Multi-Channel U-Net for Speech Dereverberation With Ad-Hoc Microphone Arrays
Hongmei Guo, Yijiang Chen, Xiao-Lei Zhang et al.
Gryannote open-source speaker diarization labeling tool
Clément Pages, Hervé Bredin