Di Hu
39 papers · 2016–2025 · 13 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+13 more ↓ Show less ↑
π Academic Marathon (9) π Interdisciplinary Bridge π§ Keyword Pioneer π Conference Polyglot (13) π Cross-Pollinator (12)
π
Cross-Pollinator
(12)
π
Renaissance Researcher
(10)
πΊοΈ
Taxonomy Completionist
(53)
π¬
Deep Specialist
(12)
π
Grand Slam
π§¬
Topic Evolution
π
Keyword Champion
(3)
π
Century Club
(39)
π₯
Unstoppable
(7)
β
The Questioner
ποΈ
Keyword Collector
(140)
β‘
Prolific Year
(12)
π
Trend Setter
Conferences
CVPR (12)
ECCV (6)
AAAI (5)
ICML (3)
CORL (2)
ICCV (2)
ICLR (2)
WACV (2)
ACL (1)
ACML (1)
INTERSPEECH (1)
NIPS (1)
RSS (1)
Top co-authors
Research topics
Keywords
multimodal learning
(8)
audio-visual learning
(5)
sound source localization
(3)
self-supervised learning
(3)
sound separation
(3)
scene understanding
(2)
visual sound
(2)
multi-modal learning
(2)
multimodal large language model
(2)
graph neural network
(2)
domain adaptation
(2)
temporal modeling
(2)
video understanding
(2)
representation learning
(2)
sound localization
(2)
audiovisual learning
(2)
cross-modal learning
(2)
source separation
(1)
feature extraction
(1)
image generation
(1)
Papers
Adaptive Unimodal Regulation for Balanced Multimodal Information Acquisition
CVPR 2025
Efficient Quantification of Multimodal Interaction at Sample Level
ICML 2025
RollingQ: Reviving the Cooperation Dynamics in Multimodal Transformer
ICML 2025
Towards Effective and Efficient Continual Pre-training of Large Language Models
ACL 2025
AnyTouch: Learning Unified Static-Dynamic Representation across Multiple Visuo-tactile Sensors
ICLR 2025
Phoenix: A Motion-based Self-Reflection Framework for Fine-grained Robotic Action Correction
CVPR 2025
Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation
CVPR 2025
Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception
CVPR 2025
Diagnosing and Re-learning for Balanced Multimodal Learning
ECCV 2024
MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance
ICML 2024
Enhancing Multimodal Cooperation via Sample-level Modality Valuation
CVPR 2024
Quantifying and Enhancing Multi-modal Robustness with Modality Preference
ICLR 2024
Can Textual Semantics Mitigate Sounding Object Segmentation Preference?
ECCV 2024
Play to the Score: Stage-Guided Dynamic Multi-Sensory Fusion for Robotic Manipulation
CORL 2024
KOI: Accelerating Online Imitation Learning via Hybrid Key-state Guidance
CORL 2024
Learning Manipulation by Predicting Interaction
RSS 2024
Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes
ECCV 2024
Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation
ECCV 2024
Prompting Segmentation with Sound Is Generalizable Audio-Visual Source Localizer
AAAI 2024
SphereDiffusion: Spherical Geometry-Aware Distortion Resilient Diffusion Model
AAAI 2024
SeCo: Separating Unknown Musical Visual Sounds With Consistency Guidance
WACV 2023
Towards Inadequately Pre-trained Models in Transfer Learning
ICCV 2023
Multi-Scale Attention for Audio Question Answering
INTERSPEECH 2023
Exploiting Visual Context Semantics for Sound Source Localization
WACV 2023
Balanced Multimodal Learning via On-the-Fly Gradient Modulation
CVPR 2022
SepFusion: Finding Optimal Fusion Structures for Visual Sound Separation
AAAI 2022
Learning To Answer Questions in Dynamic Audio-Visual Scenarios
CVPR 2022
Visual Sound Localization in the Wild by Cross-Modal Interference Erasing
AAAI 2022
Unsupervised Multi-Source Domain Adaptation for Person Re-Identification
CVPR 2021
Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation
CVPR 2021
Temporal Relational Modeling with Self-Supervision for Action Segmentation
AAAI 2021
Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition
ECCV 2020
Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching
NIPS 2020
Multiple Sound Sources Localization from Coarse to Fine
ECCV 2020
Listen to the Image
CVPR 2019
Deep Multimodal Clustering for Unsupervised Audiovisual Learning
CVPR 2019
Multivariate Time Series Prediction Based on Optimized Temporal Convolutional Networks with Stacked Auto-encoders
ACML 2019
Image2song: Song Retrieval via Bridging Image Content and Lyric Words
ICCV 2017
Temporal Multimodal Learning in Audiovisual Speech Recognition
CVPR 2016