Ryo Masumura
57 papers · 2015–2026 · 7 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+15 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (22) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (6) π£ Hot Topic Early Bird
π
Interdisciplinary Bridge
π
Academic Marathon
(11)
π
Cross-Pollinator
(10)
π
Conference Loyalist
(45)
π¬
Deep Specialist
(20)
π§¬
Topic Evolution
π
Keyword Champion
(3)
π€
Dynamic Duo
(28)
π
Century Club
(56)
π
Trend Setter
π
Conference Pioneer
β‘
Prolific Year
(5)
π₯
Unstoppable
(12)
β
The Questioner
ποΈ
Keyword Collector
(87)
Conferences
INTERSPEECH (45)
AAAI (3)
COLING (2)
EMNLP (2)
ICCV (2)
IJCNLP (2)
WACV (1)
Top co-authors
Research topics
Keywords
automatic speech recognition
(13)
speech recognition
(5)
deep neural network
(5)
multimodal learning
(4)
attention mechanism
(4)
multi-task learning
(4)
joint modeling
(3)
autoregressive model
(3)
knowledge distillation
(3)
multi-talker speech
(3)
overlapped speech
(3)
recurrent neural network
(3)
autoregressive modeling
(3)
neural network
(3)
semi-supervised learning
(2)
speech analysis
(2)
representation learning
(2)
speaker verification
(2)
long short-term memory
(2)
domain adaptation
(2)
Papers
Difference Vector Equalization for Robust Fine-tuning of Vision-Language Models
AAAI 2026
Distribution Highlighted Reference-based Label Distribution Learning for Facial Age Estimation
WACV 2026
MVTrajecter: Multi-View Pedestrian Tracking with Trajectory Motion Cost and Trajectory Appearance Cost
ICCV 2025
ToMATO: Verbalizing the Mental States of Role-Playing LLMs for Benchmarking Theory of Mind
AAAI 2025
Multimodal Fine-Grained Apparent Personality Trait Recognition: Joint Modeling of Big Five and Questionnaire Item-level Scores
AAAI 2025
Unified Multi-Talker ASR with and without Target-speaker Enrollment
INTERSPEECH 2024
Factor-Conditioned Speaking-Style Captioning
INTERSPEECH 2024
SOMSRED: Sequential Output Modeling for Joint Multi-talker Overlapped Speech Recognition and Speaker Diarization
INTERSPEECH 2024
Boosting Hybrid Autoregressive Transducer-based ASR with Internal Acoustic Model Training and Dual Blank Thresholding
INTERSPEECH 2024
Participant-Pair-Wise Bottleneck Transformer for Engagement Estimation from Video Conversation
INTERSPEECH 2024
Learning from Multiple Annotator Biased Labels in Multimodal Conversation
INTERSPEECH 2024
Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss
INTERSPEECH 2023
Transcribing Speech as Spoken and Written Dual Text Using an Autoregressive Model
INTERSPEECH 2023
Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff
ICCV 2023
Joint Autoregressive Modeling of End-to-End Multi-Talker Overlapped Speech Recognition and Utterance-level Timestamp Prediction
INTERSPEECH 2023
End-to-End Joint Target and Non-Target Speakers ASR
INTERSPEECH 2023
Audio-Visual Praise Estimation for Conversational Video based on Synchronization-Guided Multimodal Transformer
INTERSPEECH 2023
What are differences? Comparing DNN and Human by Their Performance and Characteristics in Speaker Age Estimation
INTERSPEECH 2023
Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data
INTERSPEECH 2023
Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations
INTERSPEECH 2022
Multi-Perspective Document Revision
COLING 2022
Interactive Co-Learning with Cross-Modal Transformer for Audio-Visual Emotion Recognition
INTERSPEECH 2022
Predicting VQVAE-based Character Acting Style from Quotation-Annotated Text for Audiobook Speech Synthesis
INTERSPEECH 2022
End-to-End Joint Modeling of Conversation History-Dependent and Independent ASR Systems with Multi-History Training
INTERSPEECH 2022
Dialogue Acts Aided Important Utterance Detection Based on Multiparty and Multimodal Information
INTERSPEECH 2022
Domain Adversarial Self-Supervised Speech Representation Learning for Improving Unknown Domain Downstream Tasks
INTERSPEECH 2022
Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data
INTERSPEECH 2022
End-to-End Rich Transcription-Style Automatic Speech Recognition with Semi-Supervised Learning
INTERSPEECH 2021
Enrollment-Less Training for Personalized Voice Activity Detection
INTERSPEECH 2021
Zero-Shot Joint Modeling of Multiple Spoken-Text-Style Conversion Tasks Using Switching Tokens
INTERSPEECH 2021
Streaming End-to-End Speech Recognition for Hybrid RNN-T/Attention Architecture
INTERSPEECH 2021
Unified Autoregressive Modeling for Joint End-to-End Multi-Talker Overlapped Speech Recognition and Speaker Attribute Estimation
INTERSPEECH 2021
Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition
INTERSPEECH 2021
Self-Distillation for Improving CTC-Transformer-Based ASR Systems
INTERSPEECH 2020
Investigating Effective Additional Contextual Factors in DNN-Based Spontaneous Speech Synthesis
INTERSPEECH 2020
Phoneme-to-Grapheme Conversion Based Large-Scale Pre-Training for End-to-End Automatic Speech Recognition
INTERSPEECH 2020
A Transformer-Based Audio Captioning Model with Keyword Estimation
INTERSPEECH 2020
Unsupervised Domain Adaptation for Dialogue Sequence Labeling Based on Hierarchical Adversarial Training
INTERSPEECH 2020
Speech Emotion Recognition Based on Multi-Label Emotion Existence Model
INTERSPEECH 2019
Joint Maximization Decoder with Neural Converters for Fully Neural Network-Based Japanese Speech Recognition
INTERSPEECH 2019
A Joint End-to-End and DNN-HMM Hybrid Automatic Speech Recognition System with Transferring Sharable Knowledge
INTERSPEECH 2019
End-to-End Automatic Speech Recognition with a Reconstruction Criterion Using Speech-to-Text and Text-to-Speech Encoder-Decoders
INTERSPEECH 2019
Improving Conversation-Context Language Models with Multiple Spoken Language Understanding Models
INTERSPEECH 2019
Adversarial Training for Multi-task and Multi-lingual Joint Modeling of Utterance Intent Classification
EMNLP 2018
Automatic Question Detection from Acoustic and Phonetic Features Using Feature-wise Pre-training
INTERSPEECH 2018
Multi-task and Multi-lingual Joint Learning of Neural Lexical Utterance Classification based on Partially-shared Modeling
COLING 2018
Role Play Dialogue Aware Language Models Based on Conditional Hierarchical Recurrent Encoder-Decoder
INTERSPEECH 2018
Neural Error Corrective Language Models for Automatic Speech Recognition
INTERSPEECH 2018
Improving Neural Text Normalization with Data Augmentation at Character- and Morphological Levels
IJCNLP 2017
Parallel Hierarchical Attention Networks with Shared Memory Reader for Multi-Stream Conversational Document Classification
INTERSPEECH 2017
Hierarchical LSTMs with Joint Learning for Estimating Customer Satisfaction from Contact Center Calls
INTERSPEECH 2017
Online End-of-Turn Detection from Speech Based on Stacked Time-Asynchronous Sequential Networks
INTERSPEECH 2017
Prosody Aware Word-Level Encoder Based on BLSTM-RNNs for DNN-Based Speech Synthesis
INTERSPEECH 2017
Hyperspherical Query Likelihood Models with Word Embeddings
IJCNLP 2017
Language Identification Based on Generative Modeling of Posteriorgram Sequences Extracted from Frame-by-Frame DNNs and LSTM-RNNs
INTERSPEECH 2016
Recurrent Out-of-Vocabulary Word Detection Using Distribution of Features
INTERSPEECH 2016
Hierarchical Latent Words Language Models for Robust Modeling to Out-Of Domain Tasks
EMNLP 2015