Yuhang Zang

36 papers · 2019–2026 · 8 conferences · across top CS/AI conferences

Achievements

+12 more ↓

🧭 Keyword Pioneer 🗺️ Taxonomy Completionist (10) 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (5) 🌍 Conference Polyglot (8)

🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (10) 🧭 Keyword Pioneer 🤝 Dynamic Duo (26) 🏆 Grand Slam 👥 Mega-Team (24) 🔬 Deep Specialist (12) 🗃️ Keyword Collector (160) ⚡ Prolific Year (10) ❓ The Questioner (3) 💎 Century Club (35) 🚀 Conference Pioneer

Conferences

ICCV (10) CVPR (7) NIPS (6) ACL (3) ECCV (3) ICLR (3) AAAI (2) ICML (2)

Top co-authors

Jiaqi Wang (27) Xiaoyi Dong (25) Pan Zhang (24) Dahua Lin (20) Yuhang Cao (16) Haodong Duan (10) Tong Wu (6) Ziyu Liu (5) Yuanjun Xiong (5) Shuangrui Ding (5)

Keywords

vision-language model (8) multimodal learning (7) video understanding (4) large vision-language model (3) object detection (3) large language model (3) instance segmentation (3) multi-modal learning (3) multimodal large language model (3) temporal consistency (2) benchmark evaluation (2) long-tailed distribution (2) reinforcement learning (2) video language model (2) diffusion model (2) vision language model (2) instruction following (2) semantic segmentation (2) scene text detection (2) neural network (2)

Papers

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing ACL 2026 Light-A-Video: Training-free Video Relighting via Progressive Light Fusion ICCV 2025 MM-IFEngine: Towards Multimodal Instruction Following ICCV 2025 Visual-RFT: Visual Reinforcement Fine-Tuning ICCV 2025 Deciphering Cross-Modal Alignment in Large Vision-Language Models via Modality Integration Rate ICCV 2025 Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data ICCV 2025 Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation ICCV 2025 X-Prompt: Generalizable Auto-Regressive Visual Learning with In-Context Prompting ICCV 2025 SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree ICCV 2025 Towards Storage-Efficient Visual Document Retrieval: An Empirical Study on Reducing Patch-Level Embeddings ACL 2025 SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation ICML 2025 OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding? CVPR 2025 WildAvatar: Learning In-the-wild 3D Avatars from the Web CVPR 2025 Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction CVPR 2025 ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way CVPR 2025 Conical Visual Concentration for Efficient Large Vision-Language Models CVPR 2025 MotionClone: Training-Free Motion Cloning for Controllable Video Generation ICLR 2025 VideoRoPE: What Makes for Good Video Rotary Position Embedding? ICML 2025 MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models ICLR 2025 InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model ACL 2025 MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo ECCV 2024 ShareGPT4Video: Improving Video Understanding and Generation with Better Captions NIPS 2024 Are We on the Right Way for Evaluating Large Vision-Language Models? NIPS 2024 InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD NIPS 2024 MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations NIPS 2024 Streaming Long Video Understanding with Large Language Models NIPS 2024 Alpha-CLIP: A CLIP Model Focusing on Wherever You Want CVPR 2024 MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs NIPS 2024 Long-CLIP: Unlocking the Long-Text Capability of CLIP ECCV 2024 Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD Generalization ICLR 2024 Open-Vocabulary DETR with Conditional Matching ECCV 2022 Seesaw Loss for Long-Tailed Instance Segmentation CVPR 2021 FASA: Feature Augmentation and Sampling Adaptation for Long-Tailed Instance Segmentation ICCV 2021 KPNet: Towards Minimal Face Detector AAAI 2020 Scene Text Detection with Supervised Pyramid Context Network AAAI 2019 Efficient and Accurate Arbitrary-Shaped Text Detection With Pixel Aggregation Network ICCV 2019