Di Hu

39 papers · 2016–2025 · 13 conferences · across top CS/AI conferences

Achievements

+13 more ↓

🏃 Academic Marathon (9) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌍 Conference Polyglot (13) 🐝 Cross-Pollinator (12)

🐝 Cross-Pollinator (12) 🌈 Renaissance Researcher (10) 🗺️ Taxonomy Completionist (53) 🔬 Deep Specialist (12) 🏆 Grand Slam 🧬 Topic Evolution 🏆 Keyword Champion (3) 💎 Century Club (39) 🔥 Unstoppable (7) ❓ The Questioner 🗃️ Keyword Collector (140) ⚡ Prolific Year (12) 📈 Trend Setter

Conferences

CVPR (12) ECCV (6) AAAI (5) ICML (3) CORL (2) ICCV (2) ICLR (2) WACV (2) ACL (1) ACML (1) INTERSPEECH (1) NIPS (1) RSS (1)

Top co-authors

Yake Wei (9) Xuelong Li (7) Dong Wang (6) Guangyao Li (5) Ruoxuan Feng (5) Hang Zhou (4) Dongzhan Zhou (4) Wenke Xia (4) Yaoting Wang (4) Dejing Dou (3)

Research topics

Robotics (1)

Keywords

multimodal learning (8) audio-visual learning (5) sound source localization (3) self-supervised learning (3) sound separation (3) scene understanding (2) visual sound (2) multi-modal learning (2) multimodal large language model (2) graph neural network (2) domain adaptation (2) temporal modeling (2) video understanding (2) representation learning (2) sound localization (2) audiovisual learning (2) cross-modal learning (2) source separation (1) feature extraction (1) image generation (1)

Papers

Adaptive Unimodal Regulation for Balanced Multimodal Information Acquisition CVPR 2025 Efficient Quantification of Multimodal Interaction at Sample Level ICML 2025 RollingQ: Reviving the Cooperation Dynamics in Multimodal Transformer ICML 2025 Towards Effective and Efficient Continual Pre-training of Large Language Models ACL 2025 AnyTouch: Learning Unified Static-Dynamic Representation across Multiple Visuo-tactile Sensors ICLR 2025 Phoenix: A Motion-based Self-Reflection Framework for Fine-grained Robotic Action Correction CVPR 2025 Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation CVPR 2025 Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception CVPR 2025 Diagnosing and Re-learning for Balanced Multimodal Learning ECCV 2024 MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance ICML 2024 Enhancing Multimodal Cooperation via Sample-level Modality Valuation CVPR 2024 Quantifying and Enhancing Multi-modal Robustness with Modality Preference ICLR 2024 Can Textual Semantics Mitigate Sounding Object Segmentation Preference? ECCV 2024 Play to the Score: Stage-Guided Dynamic Multi-Sensory Fusion for Robotic Manipulation CORL 2024 KOI: Accelerating Online Imitation Learning via Hybrid Key-state Guidance CORL 2024 Learning Manipulation by Predicting Interaction RSS 2024 Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes ECCV 2024 Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation ECCV 2024 Prompting Segmentation with Sound Is Generalizable Audio-Visual Source Localizer AAAI 2024 SphereDiffusion: Spherical Geometry-Aware Distortion Resilient Diffusion Model AAAI 2024 SeCo: Separating Unknown Musical Visual Sounds With Consistency Guidance WACV 2023 Towards Inadequately Pre-trained Models in Transfer Learning ICCV 2023 Multi-Scale Attention for Audio Question Answering INTERSPEECH 2023 Exploiting Visual Context Semantics for Sound Source Localization WACV 2023 Balanced Multimodal Learning via On-the-Fly Gradient Modulation CVPR 2022 SepFusion: Finding Optimal Fusion Structures for Visual Sound Separation AAAI 2022 Learning To Answer Questions in Dynamic Audio-Visual Scenarios CVPR 2022 Visual Sound Localization in the Wild by Cross-Modal Interference Erasing AAAI 2022 Unsupervised Multi-Source Domain Adaptation for Person Re-Identification CVPR 2021 Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation CVPR 2021 Temporal Relational Modeling with Self-Supervision for Action Segmentation AAAI 2021 Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition ECCV 2020 Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching NIPS 2020 Multiple Sound Sources Localization from Coarse to Fine ECCV 2020 Listen to the Image CVPR 2019 Deep Multimodal Clustering for Unsupervised Audiovisual Learning CVPR 2019 Multivariate Time Series Prediction Based on Optimized Temporal Convolutional Networks with Stacked Auto-encoders ACML 2019 Image2song: Song Retrieval via Bridging Image Content and Lyric Words ICCV 2017 Temporal Multimodal Learning in Audiovisual Speech Recognition CVPR 2016