speech synthesis

753 papers

Explore in graph

Also known as

SSS SS TTS

Co-occurring keywords

neural vocoder (126) voice conversion (259) text-to-speech synthesis (293) speech recognition (1223) deep neural network (1801) speech generation (97) low-resource language (2234) automatic speech recognition (1764) generative adversarial network (1939) neural network (6616)

Papers

From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech Recognition EMNLP 2025

Multimodal Fine-grained Context Interaction Graph Modeling for Conversational Speech Synthesis EMNLP 2025

End-to-End Multilingual Automatic Dubbing via Duration-based Translation with Large Language Models EMNLP 2025

ELLA-V: Stable Neural Codec Language Modeling with Alignment-Guided Sequence Reordering AAAI 2025

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching ACL 2025

The Role of Prosody in Spoken Question Answering NAACL 2025

Drop the Beat! Freestyler for Accompaniment Conditioned Rapping Voice Generation AAAI 2025

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions CVPR 2025

Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment ACL 2025

EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing CVPR 2025

Text-to-speech system for low-resource languages: A case study in Shipibo-Konibo (a Panoan language from Peru) NAACL 2025

LLaMA-Omni 2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis ACL 2025

Intoner: For Chinese Poetry Intoning Synthesis IJCAI 2025

Phoneme Hallucinator: One-Shot Voice Conversion via Set Expansion AAAI 2024

TokSing: Singing Voice Synthesis based on Discrete Tokens INTERSPEECH 2024

TunArTTS: Tunisian Arabic Text-To-Speech Corpus COLING 2024

Lifelong Learning MOS Prediction for Synthetic Speech Quality Evaluation INTERSPEECH 2024

Contextual Interactive Evaluation of TTS Models in Dialogue Systems INTERSPEECH 2024

Probing the Feasibility of Multilingual Speaker Anonymization INTERSPEECH 2024

Simulating articulatory trajectories with phonological feature interpolation INTERSPEECH 2024

Pre-training Neural Transducer-based Streaming Voice Conversion for Faster Convergence and Alignment-free Training INTERSPEECH 2024

Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model COLING 2024

TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers INTERSPEECH 2024

Rasa: Building Expressive Speech Synthesis Systems for Indian Languages in Low-resource Settings INTERSPEECH 2024

Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of Speech-Silence and Word-Punctuation INTERSPEECH 2024