Atsushi Ando

18 papers · 2017–2024 · 2 conferences · across top CS/AI conferences

Achievements

+8 more ↓

🌍 Conference Polyglot (2) 🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (10) 🧭 Keyword Pioneer 🏃 Academic Marathon (7)

🌈 Renaissance Researcher (5) 🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (10) 🤝 Dynamic Duo (15) 🏆 Keyword Champion (3) ❓ The Questioner 🗃️ Keyword Collector (91) 💎 Century Club (18)

Conferences

INTERSPEECH (17) ICCV (1)

Top co-authors

Ryo Masumura (15) Naoki Makishima (7) Tomohiro Tanaka (7) Takafumi Moriya (6) Satoshi Suzuki (6) Satoshi Kobashikawa (5) Yushi Aono (5) Shota Orihashi (4) Hosana Kamiyama (4) Hiroshi Sato (4)

Research topics

Speech & Audio (2)

Keywords

automatic speech recognition (7) overlapped speech (3) speech recognition (2) autoregressive modeling (2) autoregressive model (2) speaker embedding (2) hierarchical encoder-decoder (2) recurrent neural network (2) multi-talker speech recognition (2) multi-talker speech (2) acoustic feature (2) spoken language understanding (1) attention mechanism (1) joint learning (1) adversarial training (1) model robustness (1) semi-supervised learning (1) multi-label classification (1) model complexity (1) language modeling (1)

Papers

Unified Multi-Talker ASR with and without Target-speaker Enrollment INTERSPEECH 2024 SpeakerBeam-SS: Real-time Target Speaker Extraction with Lightweight Conv-TasNet and State Space Modeling INTERSPEECH 2024 SOMSRED: Sequential Output Modeling for Joint Multi-talker Overlapped Speech Recognition and Speaker Diarization INTERSPEECH 2024 Factor-Conditioned Speaking-Style Captioning INTERSPEECH 2024 Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff ICCV 2023 End-to-End Joint Target and Non-Target Speakers ASR INTERSPEECH 2023 Joint Autoregressive Modeling of End-to-End Multi-Talker Overlapped Speech Recognition and Utterance-level Timestamp Prediction INTERSPEECH 2023 Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data INTERSPEECH 2022 End-to-End Joint Modeling of Conversation History-Dependent and Independent ASR Systems with Multi-History Training INTERSPEECH 2022 Interactive Co-Learning with Cross-Modal Transformer for Audio-Visual Emotion Recognition INTERSPEECH 2022 Streaming End-to-End Speech Recognition for Hybrid RNN-T/Attention Architecture INTERSPEECH 2021 Phoneme Duration Modeling Using Speech Rhythm-Based Speaker Embeddings for Multi-Speaker Speech Synthesis INTERSPEECH 2021 Does the Lombard Effect Improve Emotional Communication in Noise? — Analysis of Emotional Speech Acted in Noise INTERSPEECH 2019 Improving Conversation-Context Language Models with Multiple Spoken Language Understanding Models INTERSPEECH 2019 Speech Emotion Recognition Based on Multi-Label Emotion Existence Model INTERSPEECH 2019 Automatic Question Detection from Acoustic and Phonetic Features Using Feature-wise Pre-training INTERSPEECH 2018 Role Play Dialogue Aware Language Models Based on Conditional Hierarchical Recurrent Encoder-Decoder INTERSPEECH 2018 Hierarchical LSTMs with Joint Learning for Estimating Customer Satisfaction from Contact Center Calls INTERSPEECH 2017