Tomohiro Tanaka

33 papers · 2018–2026 · 3 conferences · across top CS/AI conferences

Achievements

+13 more ↓

🌍 Conference Polyglot (3) 🗺️ Taxonomy Completionist (13) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🏃 Academic Marathon (7)

🗺️ Taxonomy Completionist (13) 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🏠 Conference Loyalist (29) 🤝 Dynamic Duo (28) 🧬 Topic Evolution 🔬 Deep Specialist (18) 🏆 Keyword Champion (2) ⚡ Prolific Year (7) ❓ The Questioner 🗃️ Keyword Collector (151) 🔥 Unstoppable (8) 💎 Century Club (32)

Conferences

INTERSPEECH (29) AAAI (2) COLING (2)

Top co-authors

Ryo Masumura (29) Mana Ihori (19) Takafumi Moriya (16) Naoki Makishima (13) Shota Orihashi (12) Hiroshi Sato (12) Takanori Ashihara (9) Akihiko Takashima (8) Atsushi Ando (7) Marc Delcroix (7)

Research topics

Speech & Audio (1)

Keywords

automatic speech recognition (10) self-supervised learning (4) speech recognition (3) language model (3) knowledge distillation (3) speech representation (3) multi-task learning (3) end-to-end automatic speech recognition (2) overlapped speech (2) multi-talker speech (2) transfer learning (2) multimodal transformer (2) attention mechanism (2) autoregressive modeling (2) speaker verification (2) joint modeling (2) spoken language understanding (2) autoregressive model (2) hierarchical encoder-decoder (2) neural network (2)

Papers

Difference Vector Equalization for Robust Fine-tuning of Vision-Language Models AAAI 2026 Multimodal Fine-Grained Apparent Personality Trait Recognition: Joint Modeling of Big Five and Questionnaire Item-level Scores AAAI 2025 Unified Multi-Talker ASR with and without Target-speaker Enrollment INTERSPEECH 2024 SOMSRED: Sequential Output Modeling for Joint Multi-talker Overlapped Speech Recognition and Speaker Diarization INTERSPEECH 2024 Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization INTERSPEECH 2023 SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge? INTERSPEECH 2023 Audio-Visual Praise Estimation for Conversational Video based on Synchronization-Guided Multimodal Transformer INTERSPEECH 2023 Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data INTERSPEECH 2023 Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss INTERSPEECH 2023 Transcribing Speech as Spoken and Written Dual Text Using an Autoregressive Model INTERSPEECH 2023 End-to-End Joint Target and Non-Target Speakers ASR INTERSPEECH 2023 Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations INTERSPEECH 2022 Domain Adversarial Self-Supervised Speech Representation Learning for Improving Unknown Domain Downstream Tasks INTERSPEECH 2022 Multi-Perspective Document Revision COLING 2022 End-to-End Joint Modeling of Conversation History-Dependent and Independent ASR Systems with Multi-History Training INTERSPEECH 2022 Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech Models INTERSPEECH 2022 Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition INTERSPEECH 2021 Enrollment-Less Training for Personalized Voice Activity Detection INTERSPEECH 2021 Zero-Shot Joint Modeling of Multiple Spoken-Text-Style Conversion Tasks Using Switching Tokens INTERSPEECH 2021 Streaming End-to-End Speech Recognition for Hybrid RNN-T/Attention Architecture INTERSPEECH 2021 Unified Autoregressive Modeling for Joint End-to-End Multi-Talker Overlapped Speech Recognition and Speaker Attribute Estimation INTERSPEECH 2021 End-to-End Rich Transcription-Style Automatic Speech Recognition with Semi-Supervised Learning INTERSPEECH 2021 Sound-Image Grounding Based Focusing Mechanism for Efficient Automatic Spoken Language Acquisition INTERSPEECH 2020 Phoneme-to-Grapheme Conversion Based Large-Scale Pre-Training for End-to-End Automatic Speech Recognition INTERSPEECH 2020 Unsupervised Domain Adaptation for Dialogue Sequence Labeling Based on Hierarchical Adversarial Training INTERSPEECH 2020 Self-Distillation for Improving CTC-Transformer-Based ASR Systems INTERSPEECH 2020 Joint Maximization Decoder with Neural Converters for Fully Neural Network-Based Japanese Speech Recognition INTERSPEECH 2019 A Joint End-to-End and DNN-HMM Hybrid Automatic Speech Recognition System with Transferring Sharable Knowledge INTERSPEECH 2019 End-to-End Automatic Speech Recognition with a Reconstruction Criterion Using Speech-to-Text and Text-to-Speech Encoder-Decoders INTERSPEECH 2019 Improving Conversation-Context Language Models with Multiple Spoken Language Understanding Models INTERSPEECH 2019 Role Play Dialogue Aware Language Models Based on Conditional Hierarchical Recurrent Encoder-Decoder INTERSPEECH 2018 Neural Error Corrective Language Models for Automatic Speech Recognition INTERSPEECH 2018 Multi-task and Multi-lingual Joint Learning of Neural Lexical Utterance Classification based on Partially-shared Modeling COLING 2018