speech synthesis

753 papers

Explore in graph

Also known as

SSS SS TTS

Co-occurring keywords

neural vocoder (126) voice conversion (259) text-to-speech synthesis (293) speech recognition (1223) deep neural network (1801) speech generation (97) low-resource language (2234) automatic speech recognition (1764) generative adversarial network (1939) neural network (6616)

Papers

EmpathyEar: An Open-source Avatar Multimodal Empathetic Chatbot ACL 2024

TSP-TTS: Text-based Style Predictor with Residual Vector Quantization for Expressive Text-to-Speech INTERSPEECH 2024

Deepfake Defense: Constructing and Evaluating a Specialized Urdu Deepfake Audio Dataset ACL 2024

Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment INTERSPEECH 2024

JenGAN: Stacked Shifted Filters in GAN-Based Speech Synthesis INTERSPEECH 2024

Enhancing Out-of-Vocabulary Performance of Indian TTS Systems for Practical Applications through Low-Effort Data Strategies INTERSPEECH 2024

Towards Realistic Emotional Voice Conversion using Controllable Emotional Intensity INTERSPEECH 2024

VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild ACL 2024

TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation ACL 2024

Speaking in Wavelet Domain: A Simple and Efficient Approach to Speed up Speech Diffusion Model EMNLP 2024

Simulating articulatory trajectories with phonological feature interpolation INTERSPEECH 2024

TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers INTERSPEECH 2024

Just Because We Camp, Doesn't Mean We Should: The Ethics of Modelling Queer Voices. INTERSPEECH 2024

Small-E: Small Language Model with Linear Attention for Efficient Speech Synthesis INTERSPEECH 2024

Examining Prosody in Spoken Navigation Instructions for People with Disabilities NAACL 2024

Mimic: Speaking Style Disentanglement for Speech-Driven 3D Facial Animation AAAI 2024

StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning ACL 2024

EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control EMNLP 2024

Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal Directions INTERSPEECH 2023

Towards Robust FastSpeech 2 by Modelling Residual Multimodality INTERSPEECH 2023

P-Flow: A Fast and Data-Efficient Zero-Shot TTS through Speech Prompting NIPS 2023

ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models INTERSPEECH 2023

OverFlow: Putting flows on top of neural transducers for better TTS INTERSPEECH 2023

FOOCTTS: Generating Arabic Speech with Acoustic Environment for Football Commentator INTERSPEECH 2023

Reverberation-Controllable Voice Conversion Using Reverberation Time Estimator INTERSPEECH 2023