speech synthesis

753 papers

Explore in graph

Also known as

SSS SS TTS

Co-occurring keywords

neural vocoder (126) voice conversion (259) text-to-speech synthesis (293) speech recognition (1223) deep neural network (1801) speech generation (97) low-resource language (2234) automatic speech recognition (1764) generative adversarial network (1939) neural network (6616)

Papers

Rasa: Building Expressive Speech Synthesis Systems for Indian Languages in Low-resource Settings INTERSPEECH 2024

GE2PE: Persian End-to-End Grapheme-to-Phoneme Conversion EMNLP 2024

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head AAAI 2024

Direct Speech Synthesis from Non-Invasive, Neuromagnetic Signals INTERSPEECH 2024

Faces that Speak: Jointly Synthesising Talking Face and Speech from Text CVPR 2024

Textless Acoustic Model with Self-Supervised Distillation for Noise-Robust Expressive Speech-to-Speech Translation ACL 2024

Leveraging the Interplay between Syntactic and Acoustic Cues for Optimizing Korean TTS Pause Formation COLING 2024

GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks NIPS 2024

Towards Zero-Shot Text-To-Speech for Arabic Dialects ACL 2024

Speaking in Wavelet Domain: A Simple and Efficient Approach to Speed up Speech Diffusion Model EMNLP 2024

Experiments on Speech Synthesis for Teochew, Can Taiwanese Help ? COLING 2024

Seamless Language Expansion: Enhancing Multilingual Mastery in Self-Supervised Models INTERSPEECH 2024

An Automated End-to-End Open-Source Software for High-Quality Text-to-Speech Dataset Generation COLING 2024

SLIM: Style-Linguistics Mismatch Model for Generalized Audio Deepfake Detection NIPS 2024

Let There Be Sound: Reconstructing High Quality Speech from Silent Videos AAAI 2024

PitchFlow: adding pitch control to a Flow-matching based TTS model INTERSPEECH 2024

Well, what can you do with messy data? Exploring the prosody and pragmatic function of the discourse marker "well" with found data and speech synthesis INTERSPEECH 2024

Production of phrases by mechanical models of the human vocal tract INTERSPEECH 2024

MultiVerse: Efficient and Expressive Zero-Shot Multi-Task Text-to-Speech EMNLP 2024

Just Because We Camp, Doesn't Mean We Should: The Ethics of Modelling Queer Voices. INTERSPEECH 2024

Audio-Based Linguistic Feature Extraction for Enhancing Multi-lingual and Low-Resource Text-to-Speech EMNLP 2024

Phoneme Hallucinator: One-Shot Voice Conversion via Set Expansion AAAI 2024

Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model COLING 2024

CodecFake: Enhancing Anti-Spoofing Models Against Deepfake Audios from Codec-Based Speech Synthesis Systems INTERSPEECH 2024

TunArTTS: Tunisian Arabic Text-To-Speech Corpus COLING 2024