Ryo Masumura

57 papers · 2015–2026 · 7 conferences · across top CS/AI conferences

Achievements

+15 more ↓

🗺️ Taxonomy Completionist (22) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (6) 🐣 Hot Topic Early Bird

🌉 Interdisciplinary Bridge 🏃 Academic Marathon (11) 🐝 Cross-Pollinator (10) 🏠 Conference Loyalist (45) 🔬 Deep Specialist (20) 🧬 Topic Evolution 🏆 Keyword Champion (3) 🤝 Dynamic Duo (28) 💎 Century Club (56) 📈 Trend Setter 🚀 Conference Pioneer ⚡ Prolific Year (5) 🔥 Unstoppable (12) ❓ The Questioner 🗃️ Keyword Collector (87)

Conferences

INTERSPEECH (45) AAAI (3) COLING (2) EMNLP (2) ICCV (2) IJCNLP (2) WACV (1)

Top co-authors

Tomohiro Tanaka (29) Mana Ihori (19) Naoki Makishima (16) Takafumi Moriya (15) Atsushi Ando (15) Shota Orihashi (15) Hiroshi Sato (13) Yushi Aono (12) Nobukatsu Hojo (11) Taichi Asami (10)

Research topics

Speech & Audio (2)

Keywords

automatic speech recognition (13) speech recognition (5) deep neural network (5) multimodal learning (4) attention mechanism (4) multi-task learning (4) joint modeling (3) autoregressive model (3) knowledge distillation (3) multi-talker speech (3) overlapped speech (3) recurrent neural network (3) autoregressive modeling (3) neural network (3) semi-supervised learning (2) speech analysis (2) representation learning (2) speaker verification (2) long short-term memory (2) domain adaptation (2)

Papers

Difference Vector Equalization for Robust Fine-tuning of Vision-Language Models AAAI 2026 Distribution Highlighted Reference-based Label Distribution Learning for Facial Age Estimation WACV 2026 MVTrajecter: Multi-View Pedestrian Tracking with Trajectory Motion Cost and Trajectory Appearance Cost ICCV 2025 ToMATO: Verbalizing the Mental States of Role-Playing LLMs for Benchmarking Theory of Mind AAAI 2025 Multimodal Fine-Grained Apparent Personality Trait Recognition: Joint Modeling of Big Five and Questionnaire Item-level Scores AAAI 2025 Unified Multi-Talker ASR with and without Target-speaker Enrollment INTERSPEECH 2024 Factor-Conditioned Speaking-Style Captioning INTERSPEECH 2024 SOMSRED: Sequential Output Modeling for Joint Multi-talker Overlapped Speech Recognition and Speaker Diarization INTERSPEECH 2024 Boosting Hybrid Autoregressive Transducer-based ASR with Internal Acoustic Model Training and Dual Blank Thresholding INTERSPEECH 2024 Participant-Pair-Wise Bottleneck Transformer for Engagement Estimation from Video Conversation INTERSPEECH 2024 Learning from Multiple Annotator Biased Labels in Multimodal Conversation INTERSPEECH 2024 Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss INTERSPEECH 2023 Transcribing Speech as Spoken and Written Dual Text Using an Autoregressive Model INTERSPEECH 2023 Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff ICCV 2023 Joint Autoregressive Modeling of End-to-End Multi-Talker Overlapped Speech Recognition and Utterance-level Timestamp Prediction INTERSPEECH 2023 End-to-End Joint Target and Non-Target Speakers ASR INTERSPEECH 2023 Audio-Visual Praise Estimation for Conversational Video based on Synchronization-Guided Multimodal Transformer INTERSPEECH 2023 What are differences? Comparing DNN and Human by Their Performance and Characteristics in Speaker Age Estimation INTERSPEECH 2023 Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data INTERSPEECH 2023 Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations INTERSPEECH 2022 Multi-Perspective Document Revision COLING 2022 Interactive Co-Learning with Cross-Modal Transformer for Audio-Visual Emotion Recognition INTERSPEECH 2022 Predicting VQVAE-based Character Acting Style from Quotation-Annotated Text for Audiobook Speech Synthesis INTERSPEECH 2022 End-to-End Joint Modeling of Conversation History-Dependent and Independent ASR Systems with Multi-History Training INTERSPEECH 2022 Dialogue Acts Aided Important Utterance Detection Based on Multiparty and Multimodal Information INTERSPEECH 2022 Domain Adversarial Self-Supervised Speech Representation Learning for Improving Unknown Domain Downstream Tasks INTERSPEECH 2022 Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data INTERSPEECH 2022 End-to-End Rich Transcription-Style Automatic Speech Recognition with Semi-Supervised Learning INTERSPEECH 2021 Enrollment-Less Training for Personalized Voice Activity Detection INTERSPEECH 2021 Zero-Shot Joint Modeling of Multiple Spoken-Text-Style Conversion Tasks Using Switching Tokens INTERSPEECH 2021 Streaming End-to-End Speech Recognition for Hybrid RNN-T/Attention Architecture INTERSPEECH 2021 Unified Autoregressive Modeling for Joint End-to-End Multi-Talker Overlapped Speech Recognition and Speaker Attribute Estimation INTERSPEECH 2021 Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition INTERSPEECH 2021 Self-Distillation for Improving CTC-Transformer-Based ASR Systems INTERSPEECH 2020 Investigating Effective Additional Contextual Factors in DNN-Based Spontaneous Speech Synthesis INTERSPEECH 2020 Phoneme-to-Grapheme Conversion Based Large-Scale Pre-Training for End-to-End Automatic Speech Recognition INTERSPEECH 2020 A Transformer-Based Audio Captioning Model with Keyword Estimation INTERSPEECH 2020 Unsupervised Domain Adaptation for Dialogue Sequence Labeling Based on Hierarchical Adversarial Training INTERSPEECH 2020 Speech Emotion Recognition Based on Multi-Label Emotion Existence Model INTERSPEECH 2019 Joint Maximization Decoder with Neural Converters for Fully Neural Network-Based Japanese Speech Recognition INTERSPEECH 2019 A Joint End-to-End and DNN-HMM Hybrid Automatic Speech Recognition System with Transferring Sharable Knowledge INTERSPEECH 2019 End-to-End Automatic Speech Recognition with a Reconstruction Criterion Using Speech-to-Text and Text-to-Speech Encoder-Decoders INTERSPEECH 2019 Improving Conversation-Context Language Models with Multiple Spoken Language Understanding Models INTERSPEECH 2019 Adversarial Training for Multi-task and Multi-lingual Joint Modeling of Utterance Intent Classification EMNLP 2018 Automatic Question Detection from Acoustic and Phonetic Features Using Feature-wise Pre-training INTERSPEECH 2018 Multi-task and Multi-lingual Joint Learning of Neural Lexical Utterance Classification based on Partially-shared Modeling COLING 2018 Role Play Dialogue Aware Language Models Based on Conditional Hierarchical Recurrent Encoder-Decoder INTERSPEECH 2018 Neural Error Corrective Language Models for Automatic Speech Recognition INTERSPEECH 2018 Improving Neural Text Normalization with Data Augmentation at Character- and Morphological Levels IJCNLP 2017 Parallel Hierarchical Attention Networks with Shared Memory Reader for Multi-Stream Conversational Document Classification INTERSPEECH 2017 Hierarchical LSTMs with Joint Learning for Estimating Customer Satisfaction from Contact Center Calls INTERSPEECH 2017 Online End-of-Turn Detection from Speech Based on Stacked Time-Asynchronous Sequential Networks INTERSPEECH 2017 Prosody Aware Word-Level Encoder Based on BLSTM-RNNs for DNN-Based Speech Synthesis INTERSPEECH 2017 Hyperspherical Query Likelihood Models with Word Embeddings IJCNLP 2017 Language Identification Based on Generative Modeling of Posteriorgram Sequences Extracted from Frame-by-Frame DNNs and LSTM-RNNs INTERSPEECH 2016 Recurrent Out-of-Vocabulary Word Detection Using Distribution of Features INTERSPEECH 2016 Hierarchical Latent Words Language Models for Robust Modeling to Out-Of Domain Tasks EMNLP 2015