conftrace_

Ryo Masumura

57 papers · 2015–2026 · 7 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓
+15 more ↓ πŸ—ΊοΈ Taxonomy Completionist (22) 🧭 Keyword Pioneer πŸŒ‰ Interdisciplinary Bridge 🌈 Renaissance Researcher (6) 🐣 Hot Topic Early Bird
πŸŒ‰ Interdisciplinary Bridge πŸƒ Academic Marathon (11) 🐝 Cross-Pollinator (10) 🏠 Conference Loyalist (45) πŸ”¬ Deep Specialist (20) 🧬 Topic Evolution πŸ† Keyword Champion (3) 🀝 Dynamic Duo (28) πŸ’Ž Century Club (56) πŸ“ˆ Trend Setter πŸš€ Conference Pioneer ⚑ Prolific Year (5) πŸ”₯ Unstoppable (12) ❓ The Questioner πŸ—ƒοΈ Keyword Collector (87)

Conferences

INTERSPEECH (45) AAAI (3) COLING (2) EMNLP (2) ICCV (2) IJCNLP (2) WACV (1)

Research topics

Papers

Difference Vector Equalization for Robust Fine-tuning of Vision-Language Models AAAI 2026 Distribution Highlighted Reference-based Label Distribution Learning for Facial Age Estimation WACV 2026 MVTrajecter: Multi-View Pedestrian Tracking with Trajectory Motion Cost and Trajectory Appearance Cost ICCV 2025 ToMATO: Verbalizing the Mental States of Role-Playing LLMs for Benchmarking Theory of Mind AAAI 2025 Multimodal Fine-Grained Apparent Personality Trait Recognition: Joint Modeling of Big Five and Questionnaire Item-level Scores AAAI 2025 Unified Multi-Talker ASR with and without Target-speaker Enrollment INTERSPEECH 2024 Factor-Conditioned Speaking-Style Captioning INTERSPEECH 2024 SOMSRED: Sequential Output Modeling for Joint Multi-talker Overlapped Speech Recognition and Speaker Diarization INTERSPEECH 2024 Boosting Hybrid Autoregressive Transducer-based ASR with Internal Acoustic Model Training and Dual Blank Thresholding INTERSPEECH 2024 Participant-Pair-Wise Bottleneck Transformer for Engagement Estimation from Video Conversation INTERSPEECH 2024 Learning from Multiple Annotator Biased Labels in Multimodal Conversation INTERSPEECH 2024 Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss INTERSPEECH 2023 Transcribing Speech as Spoken and Written Dual Text Using an Autoregressive Model INTERSPEECH 2023 Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff ICCV 2023 Joint Autoregressive Modeling of End-to-End Multi-Talker Overlapped Speech Recognition and Utterance-level Timestamp Prediction INTERSPEECH 2023 End-to-End Joint Target and Non-Target Speakers ASR INTERSPEECH 2023 Audio-Visual Praise Estimation for Conversational Video based on Synchronization-Guided Multimodal Transformer INTERSPEECH 2023 What are differences? Comparing DNN and Human by Their Performance and Characteristics in Speaker Age Estimation INTERSPEECH 2023 Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data INTERSPEECH 2023 Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations INTERSPEECH 2022 Multi-Perspective Document Revision COLING 2022 Interactive Co-Learning with Cross-Modal Transformer for Audio-Visual Emotion Recognition INTERSPEECH 2022 Predicting VQVAE-based Character Acting Style from Quotation-Annotated Text for Audiobook Speech Synthesis INTERSPEECH 2022 End-to-End Joint Modeling of Conversation History-Dependent and Independent ASR Systems with Multi-History Training INTERSPEECH 2022 Dialogue Acts Aided Important Utterance Detection Based on Multiparty and Multimodal Information INTERSPEECH 2022 Domain Adversarial Self-Supervised Speech Representation Learning for Improving Unknown Domain Downstream Tasks INTERSPEECH 2022 Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data INTERSPEECH 2022 End-to-End Rich Transcription-Style Automatic Speech Recognition with Semi-Supervised Learning INTERSPEECH 2021 Enrollment-Less Training for Personalized Voice Activity Detection INTERSPEECH 2021 Zero-Shot Joint Modeling of Multiple Spoken-Text-Style Conversion Tasks Using Switching Tokens INTERSPEECH 2021 Streaming End-to-End Speech Recognition for Hybrid RNN-T/Attention Architecture INTERSPEECH 2021 Unified Autoregressive Modeling for Joint End-to-End Multi-Talker Overlapped Speech Recognition and Speaker Attribute Estimation INTERSPEECH 2021 Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition INTERSPEECH 2021 Self-Distillation for Improving CTC-Transformer-Based ASR Systems INTERSPEECH 2020 Investigating Effective Additional Contextual Factors in DNN-Based Spontaneous Speech Synthesis INTERSPEECH 2020 Phoneme-to-Grapheme Conversion Based Large-Scale Pre-Training for End-to-End Automatic Speech Recognition INTERSPEECH 2020 A Transformer-Based Audio Captioning Model with Keyword Estimation INTERSPEECH 2020 Unsupervised Domain Adaptation for Dialogue Sequence Labeling Based on Hierarchical Adversarial Training INTERSPEECH 2020 Speech Emotion Recognition Based on Multi-Label Emotion Existence Model INTERSPEECH 2019 Joint Maximization Decoder with Neural Converters for Fully Neural Network-Based Japanese Speech Recognition INTERSPEECH 2019 A Joint End-to-End and DNN-HMM Hybrid Automatic Speech Recognition System with Transferring Sharable Knowledge INTERSPEECH 2019 End-to-End Automatic Speech Recognition with a Reconstruction Criterion Using Speech-to-Text and Text-to-Speech Encoder-Decoders INTERSPEECH 2019 Improving Conversation-Context Language Models with Multiple Spoken Language Understanding Models INTERSPEECH 2019 Adversarial Training for Multi-task and Multi-lingual Joint Modeling of Utterance Intent Classification EMNLP 2018 Automatic Question Detection from Acoustic and Phonetic Features Using Feature-wise Pre-training INTERSPEECH 2018 Multi-task and Multi-lingual Joint Learning of Neural Lexical Utterance Classification based on Partially-shared Modeling COLING 2018 Role Play Dialogue Aware Language Models Based on Conditional Hierarchical Recurrent Encoder-Decoder INTERSPEECH 2018 Neural Error Corrective Language Models for Automatic Speech Recognition INTERSPEECH 2018 Improving Neural Text Normalization with Data Augmentation at Character- and Morphological Levels IJCNLP 2017 Parallel Hierarchical Attention Networks with Shared Memory Reader for Multi-Stream Conversational Document Classification INTERSPEECH 2017 Hierarchical LSTMs with Joint Learning for Estimating Customer Satisfaction from Contact Center Calls INTERSPEECH 2017 Online End-of-Turn Detection from Speech Based on Stacked Time-Asynchronous Sequential Networks INTERSPEECH 2017 Prosody Aware Word-Level Encoder Based on BLSTM-RNNs for DNN-Based Speech Synthesis INTERSPEECH 2017 Hyperspherical Query Likelihood Models with Word Embeddings IJCNLP 2017 Language Identification Based on Generative Modeling of Posteriorgram Sequences Extracted from Frame-by-Frame DNNs and LSTM-RNNs INTERSPEECH 2016 Recurrent Out-of-Vocabulary Word Detection Using Distribution of Features INTERSPEECH 2016 Hierarchical Latent Words Language Models for Robust Modeling to Out-Of Domain Tasks EMNLP 2015