Bowen Shi

37 papers · 2019–2026 · 11 conferences · across top CS/AI conferences

Achievements

+12 more ↓

🧭 Keyword Pioneer 🌈 Renaissance Researcher (5) 🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (16) 🐣 Hot Topic Early Bird

🌈 Renaissance Researcher (5) 🗺️ Taxonomy Completionist (16) 🧭 Keyword Pioneer 🔬 Deep Specialist (10) 🏆 Keyword Champion (2) 🤝 Dynamic Duo (13) 🏆 Grand Slam 🗃️ Keyword Collector (137) ⚡ Prolific Year (8) 📈 Trend Setter 💎 Century Club (35) 🔥 Unstoppable (7)

Conferences

INTERSPEECH (8) ACL (7) ICLR (4) CVPR (3) EMNLP (3) ICML (3) NIPS (3) ECCV (2) ICCV (2) AAAI (1) JMLR (1)

Top co-authors

Wei-Ning Hsu (13) Wenrui Dai (9) Hongkai Xiong (9) Karen Livescu (8) XIAOPENG ZHANG (8) Yaoming Wang (7) Qi Tian (7) Jin Li (6) Diane Brentari (6) Chenglin Li (6)

Keywords

video understanding (5) sign language translation (4) self-supervised learning (4) representation learning (3) audio-visual speech recognition (3) sign language recognition (3) zero-shot learning (3) speech recognition (3) american sign language (3) multimodal learning (3) speech generation (2) audio-visual speech (2) vision transformer (2) speech translation (2) automatic speech recognition (2) model quantization (2) model compression (2) multi-task learning (2) speech synthesis (2) self-supervised pretraining (2)

Papers

Profiling-Free Mixed-Precision Quantization for MoE LLMs via Fuzzy Rule Interpolation ACL 2026 CT-FineBench: A Diagnostic Fidelity Benchmark for Fine-Grained Evaluation of CT Report Generation ACL 2026 METEOR: Multi-Encoder Collaborative Token Pruning for Efficient Vision Language Models ICCV 2025 MDCure: A Scalable Pipeline for Multi-Document Instruction-Following ACL 2025 MusicFlow: Cascaded Flow Matching for Text Guided Music Generation ICML 2024 BarLeRIa: An Efficient Tuning Framework for Referring Image Segmentation ICLR 2024 Scaling Speech Technology to 1,000+ Languages JMLR 2024 Hybrid Distillation: Connecting Masked Autoencoders with Contrastive Learners ICLR 2024 Generative Pre-training for Speech with Flow Matching ICLR 2024 Learning Fine-Grained Controllability on Speech Generation via Efficient Fine-Tuning INTERSPEECH 2024 Towards Privacy-Aware Sign Language Translation at Scale ACL 2024 XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception ACL 2024 Bootstrap AutoEncoders With Contrastive Paradigm for Self-supervised Gaze Estimation ICML 2024 UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding ECCV 2024 Pose-Oriented Transformer with Uncertainty-Guided Refinement for 2D-to-3D Human Pose Estimation AAAI 2023 Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale NIPS 2023 Adapting Shortcut With Normalizing Flow: An Efficient Tuning Framework for Visual Recognition CVPR 2023 ReVISE: Self-Supervised Speech Resynthesis With Visual Input for Universal and Generalized Speech Regeneration CVPR 2023 TTIC’s Submission to WMT-SLT 23 EMNLP 2023 SEGA: Structural Entropy Guided Anchor View for Graph Contrastive Learning ICML 2023 MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation INTERSPEECH 2023 Expresso: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis INTERSPEECH 2023 AiluRus: A Scalable ViT Framework for Dense Prediction NIPS 2023 TTIC’s WMT-SLT 22 Sign Language Translation System EMNLP 2022 Open-Domain Sign Language Translation Learned from Online Video EMNLP 2022 A Transformer-Based Decoder for Semantic Segmentation with Multi-level Context Mining ECCV 2022 Robust Self-Supervised Audio-Visual Speech Recognition INTERSPEECH 2022 Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT INTERSPEECH 2022 Searching for fingerspelled content in American Sign Language ACL 2022 Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction ICLR 2022 u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality NIPS 2022 Fingerspelling Detection in American Sign Language CVPR 2021 A Joint Framework for Audio Tagging and Weakly Supervised Acoustic Event Detection Using DenseNet with Global Average Pooling INTERSPEECH 2020 A Cross-Task Analysis of Text Span Representations ACL 2020 Compression of Acoustic Event Detection Models with Quantized Distillation INTERSPEECH 2019 On the Contributions of Visual and Textual Supervision in Low-Resource Semantic Speech Retrieval INTERSPEECH 2019 Fingerspelling Recognition in the Wild With Iterative Visual Attention ICCV 2019