Hiroshi Saruwatari

30 papers · 2016–2024 · 2 conferences · across top CS/AI conferences

Achievements

+13 more ↓

🧭 Keyword Pioneer 🗺️ Taxonomy Completionist (14) 🌈 Renaissance Researcher (5) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (2)

🏃 Academic Marathon (8) 🗺️ Taxonomy Completionist (14) 🧭 Keyword Pioneer 🏠 Conference Loyalist (29) 🤝 Dynamic Duo (24) 🔬 Deep Specialist (11) 🏆 Keyword Champion (2) 🔥 Unstoppable (5) 🚀 Conference Pioneer ⚡ Prolific Year (6) 🗃️ Keyword Collector (131) 💎 Century Club (30) 📈 Trend Setter

Conferences

INTERSPEECH (29) IJCAI (1)

Top co-authors

Shinnosuke Takamichi (24) Yuki Saito (15) Tomoki Koriyama (9) Kentaro Tachibana (6) Detai Xin (5) Takaaki Saeki (5) Kentaro Seki (5) Wataru Nakata (3) Tomohiko Nakamura (2) Shinji Watanabe (2)

Keywords

speech synthesis (9) voice conversion (5) self-supervised learning (4) speech quality (4) domain adaptation (3) dialogue system (3) empathetic dialogue (3) speech enhancement (3) cross-lingual synthesis (2) speaker adaptation (2) speech corpus (2) speaker embedding (2) language model (2) sequence-to-sequence learning (2) speaker individuality (2) deep neural network (2) deep gaussian process (2) multilingual processing (1) emotion recognition (1) ensemble learning (1)

Papers

Spatial Voice Conversion: Voice Conversion Preserving Spatial Information and Non-target Signals INTERSPEECH 2024 SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics INTERSPEECH 2024 Noise-Robust Voice Conversion by Conditional Denoising Training Using Latent Variables of Recording Quality and Environment INTERSPEECH 2024 SaSLaW: Dialogue Speech Corpus with Audio-visual Egocentric Information Toward Environment-adaptive Dialogue Speech Synthesis INTERSPEECH 2024 SRC4VC: Smartphone-Recorded Corpus for Voice Conversion Benchmark INTERSPEECH 2024 Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining IJCAI 2023 Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus INTERSPEECH 2023 How Generative Spoken Language Modeling Encodes Noisy Speech: Investigation from Phonetics to Syntactics INTERSPEECH 2023 ChatGPT-EDSS: Empathetic Dialogue Speech Synthesis Trained from ChatGPT-derived Context Word Embeddings INTERSPEECH 2023 HumanDiffusion: diffusion model using perceptual gradients INTERSPEECH 2023 CALLS: Japanese Empathetic Dialogue Speech Corpus of Complaint Handling and Attentive Listening in Customer Center INTERSPEECH 2023 Predicting VQVAE-based Character Acting Style from Quotation-Annotated Text for Audiobook Speech Synthesis INTERSPEECH 2022 STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent INTERSPEECH 2022 J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis INTERSPEECH 2022 Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS INTERSPEECH 2022 Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History INTERSPEECH 2022 SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling INTERSPEECH 2022 UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022 INTERSPEECH 2022 Cross-Lingual Speaker Adaptation Using Domain Adaptation and Speaker Consistency Loss for Text-To-Speech Synthesis INTERSPEECH 2021 Harmonic WaveGAN: GAN-Based Speech Waveform Generation Model with Harmonic Structure Discriminator INTERSPEECH 2021 Sequence-to-Sequence Learning for Deep Gaussian Process Based Speech Synthesis Using Self-Attention GP Layer INTERSPEECH 2021 Real-Time, Full-Band, Online DNN-Based Voice Conversion System Using a Single CPU INTERSPEECH 2020 Investigating Effective Additional Contextual Factors in DNN-Based Spontaneous Speech Synthesis INTERSPEECH 2020 End-to-End Text-to-Speech Synthesis with Unaligned Multiple Language Units Based on Attention INTERSPEECH 2020 Harmonic Lowering for Accelerating Harmonic Convolution for Audio Signals INTERSPEECH 2020 Multi-Speaker Text-to-Speech Synthesis Using Deep Gaussian Processes INTERSPEECH 2020 Cross-Lingual Text-To-Speech Synthesis via Domain Adaptation and Perceptual Similarity Regression in Speaker Space INTERSPEECH 2020 Sampling-Based Speech Parameter Generation Using Moment-Matching Networks INTERSPEECH 2017 Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities INTERSPEECH 2017 Semi-Supervised Joint Enhancement of Spectral and Cepstral Sequences of Noisy Speech INTERSPEECH 2016