Yuhang Cao
23 papers · 2017–2026 · 9 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+10 more ↓ Show less ↑
π Academic Marathon (9) π Conference Polyglot (9) π Interdisciplinary Bridge π§ Keyword Pioneer π Cross-Pollinator (5)
π
Cross-Pollinator
(5)
π
Renaissance Researcher
(6)
πΊοΈ
Taxonomy Completionist
(51)
π€
Dynamic Duo
(19)
π₯
Mega-Team
(24)
π§¬
Topic Evolution
β
The Questioner
(2)
π
Century Club
(23)
β‘
Prolific Year
(14)
ποΈ
Keyword Collector
(126)
Conferences
CVPR (6)
ICCV (6)
ACL (2)
ICML (2)
INTERSPEECH (2)
NIPS (2)
ECCV (1)
ICLR (1)
WACV (1)
Top co-authors
Keywords
object detection
(4)
vision-language model
(3)
multimodal learning
(3)
diffusion model
(2)
video understanding
(2)
temporal consistency
(2)
vision language model
(2)
reinforcement learning
(2)
video language model
(2)
instruction following
(2)
multimodal large language model
(2)
direct preference optimization
(1)
vision transformer
(1)
attention mechanism
(1)
preference learning
(1)
sampling strategy
(1)
benchmark evaluation
(1)
3d reconstruction
(1)
multi-modal learning
(1)
speech separation
(1)
Papers
OMeGa: Joint Optimization of Explicit Meshes and Gaussian Splats for Robust Scene-Level Surface Reconstruction
WACV 2026
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
ACL 2025
Towards Storage-Efficient Visual Document Retrieval: An Empirical Study on Reducing Patch-Level Embeddings
ACL 2025
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree
ICCV 2025
Light-A-Video: Training-free Video Relighting via Progressive Light Fusion
ICCV 2025
VideoRoPE: What Makes for Good Video Rotary Position Embedding?
ICML 2025
OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
CVPR 2025
Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
CVPR 2025
ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way
CVPR 2025
Conical Visual Concentration for Efficient Large Vision-Language Models
CVPR 2025
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation
ICML 2025
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
ICLR 2025
MM-IFEngine: Towards Multimodal Instruction Following
ICCV 2025
Visual-RFT: Visual Reinforcement Fine-Tuning
ICCV 2025
Deciphering Cross-Modal Alignment in Large Vision-Language Models via Modality Integration Rate
ICCV 2025
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
NIPS 2024
V3Det: Vast Vocabulary Visual Detection Dataset
ICCV 2023
Few-Shot Object Detection via Association and DIscrimination
NIPS 2021
Seesaw Loss for Long-Tailed Instance Segmentation
CVPR 2021
Side-Aware Boundary Localization for More Precise Object Detection
ECCV 2020
Prime Sample Attention in Object Detection
CVPR 2020
Investigation of Cost Function for Supervised Monaural Speech Separation
INTERSPEECH 2019
Speaker Direction-of-Arrival Estimation Based on Frequency-Independent Beampattern
INTERSPEECH 2017