Naoyuki Kanda

28 papers · 2016–2024 · 3 conferences · across top CS/AI conferences

Achievements

+15 more ↓

🐣 Hot Topic Early Bird 🗺️ Taxonomy Completionist (19) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌍 Conference Polyglot (3)

🗺️ Taxonomy Completionist (19) 🧭 Keyword Pioneer 🏃 Academic Marathon (8) 🏠 Conference Loyalist (26) 🏆 Keyword Champion (3) 🧬 Topic Evolution 👥 Mega-Team (20) 🔬 Deep Specialist (15) 🤝 Dynamic Duo (14) 📈 Trend Setter 🚀 Conference Pioneer ⚡ Prolific Year (5) 💎 Century Club (28) 🗃️ Keyword Collector (54) 🔥 Unstoppable (7)

Conferences

INTERSPEECH (26) AAAI (1) NAACL (1)

Top co-authors

Takuya Yoshioka (14) Xiaofei Wang (11) Jinyu Li (10) Zhong Meng (10) Zhuo Chen (9) Yashesh Gaur (7) Yu Wu (6) Yifan Gong (5) Kenji Nagamatsu (5) Shota Horiguchi (4)

Keywords

automatic speech recognition (8) speaker diarization (5) speaker identification (4) word error rate (4) speech separation (4) end-to-end speech recognition (4) speech recognition (4) serialized output training (3) multimodal learning (3) speaker counting (3) end-to-end model (3) transformer transducer (3) acoustic model (3) multi-talker speech recognition (2) overlapped speech (2) semi-supervised learning (2) speaker embedding (2) deep neural network (2) speech enhancement (2) language model (2)

Papers

An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS INTERSPEECH 2024 i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data NAACL 2024 NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription INTERSPEECH 2024 Total-Duration-Aware Duration Modeling for Text-to-Speech Systems INTERSPEECH 2024 i-Code: An Integrative and Composable Multimodal Learning Framework AAAI 2023 Factual Consistency Oriented Speech Recognition INTERSPEECH 2023 Adapting Multi-Lingual ASR Models for Handling Multiple Talkers INTERSPEECH 2023 Speaker Diarization for ASR Output with T-vectors: A Sequence Classification Approach INTERSPEECH 2023 Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation INTERSPEECH 2022 Separating Long-Form Speech with Group-wise Permutation Invariant Training INTERSPEECH 2022 Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings INTERSPEECH 2022 Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition INTERSPEECH 2022 Streaming Multi-Talker ASR with Token-Level Serialized Output Training INTERSPEECH 2022 Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone INTERSPEECH 2021 On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer INTERSPEECH 2021 End-to-End Speaker-Attributed ASR with Transformer INTERSPEECH 2021 Streaming Multi-Talker Speech Recognition with Joint Speaker Identification INTERSPEECH 2021 Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition INTERSPEECH 2021 Investigation of Practical Aspects of Single Channel Speech Separation for ASR INTERSPEECH 2021 Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of any Number of Speakers INTERSPEECH 2020 Serialized Output Training for End-to-End Overlapped Speech Recognition INTERSPEECH 2020 Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition INTERSPEECH 2019 Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University Joint Investigation for Dinner Party ASR INTERSPEECH 2019 Multimodal Response Obligation Detection with Unsupervised Online Domain Adaptation INTERSPEECH 2019 End-to-End Neural Speaker Diarization with Permutation-Free Objectives INTERSPEECH 2019 Lattice-free State-level Minimum Bayes Risk Training of Acoustic Models INTERSPEECH 2018 Maximum a posteriori Based Decoding for CTC Acoustic Models INTERSPEECH 2016 Investigation of Semi-Supervised Acoustic Model Training Based on the Committee of Heterogeneous Neural Networks INTERSPEECH 2016