Kazuhito Koishida

20 papers · 2017–2026 · 7 conferences · across top CS/AI conferences

Achievements

+8 more ↓

🏃 Academic Marathon (8) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (6) 🐝 Cross-Pollinator (12)

🐝 Cross-Pollinator (12) 🌈 Renaissance Researcher (7) 🗺️ Taxonomy Completionist (35) 🧬 Topic Evolution 💎 Century Club (19) 🗃️ Keyword Collector (81) ⚡ Prolific Year (6) ❓ The Questioner

Conferences

INTERSPEECH (11) CVPR (2) ICLR (2) ICML (2) ACL (1) EACL (1) NIPS (1)

Top co-authors

Yinheng Li (4) Dung N. Tran (3) Dan Zhao (3) Tianyi Chen (2) Rogerio Bonatti (2) Lawrence Keunho Jang (2) Zheng Hui (2) Saeed Amizadeh (2) David Aponte (2) Justin Wagle (2)

Keywords

speech enhancement (6) knowledge distillation (2) gui grounding (2) multimodal learning (2) convolutional neural network (2) embedding learning (1) visual perception (1) representation learning (1) speaker verification (1) ensemble learning (1) channel attention (1) feature embedding (1) scene graph generation (1) visual grounding (1) attention mechanism (1) deep learning (1) efficient inference (1) speaker embedding (1) visual question answering (1) metric learning (1)

Papers

Do GUI Grounders Truly Understand UI Elements? EACL 2026 VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks ICLR 2025 Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale ICML 2025 Automatic Joint Structured Pruning and Quantization for Efficient Neural Network Training and Compression CVPR 2025 WinSpot: GUI Grounding Benchmark with Multimodal Large Language Models ACL 2025 Weakly-supervised Audio Separation via Bi-modal Semantic Similarity ICLR 2024 ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation INTERSPEECH 2024 LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes INTERSPEECH 2024 Progressive Ensemble Distillation: Building Ensembles for Efficient Inference NIPS 2023 SCP-GAN: Self-Correcting Discriminator Optimization for Training Consistency Preserving Metric GAN on Speech Enhancement Tasks INTERSPEECH 2023 INTERSPEECH 2021 Deep Noise Suppression Challenge INTERSPEECH 2021 Single-Channel Speech Enhancement Using Learnable Loss Mixup INTERSPEECH 2021 Low-Latency Single Channel Speech Dereverberation Using U-Net Convolutional Neural Networks INTERSPEECH 2020 Single-Channel Speech Enhancement by Subspace Affinity Minimization INTERSPEECH 2020 MMTM: Multimodal Transfer Module for CNN Fusion CVPR 2020 Online Directional Speech Enhancement Using Geometrically Constrained Independent Vector Analysis INTERSPEECH 2020 Robust Pitch Regression with Voiced/Unvoiced Classification in Nonstationary Noise Environments INTERSPEECH 2020 Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning" ICML 2020 Sound Event Detection in Multichannel Audio Using Convolutional Time-Frequency-Channel Squeeze and Excitation INTERSPEECH 2019 End-to-End Text-Independent Speaker Verification with Triplet Loss on Short Utterances INTERSPEECH 2017