Yuhang Zang
36 papers · 2019–2026 · 8 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+12 more ↓ Show less ↑
π§ Keyword Pioneer πΊοΈ Taxonomy Completionist (10) π Interdisciplinary Bridge π Renaissance Researcher (5) π Conference Polyglot (8)
π
Interdisciplinary Bridge
πΊοΈ
Taxonomy Completionist
(10)
π§
Keyword Pioneer
π€
Dynamic Duo
(26)
π
Grand Slam
π₯
Mega-Team
(24)
π¬
Deep Specialist
(12)
ποΈ
Keyword Collector
(160)
β‘
Prolific Year
(10)
β
The Questioner
(3)
π
Century Club
(35)
π
Conference Pioneer
Conferences
ICCV (10)
CVPR (7)
NIPS (6)
ACL (3)
ECCV (3)
ICLR (3)
AAAI (2)
ICML (2)
Top co-authors
Keywords
vision-language model
(8)
multimodal learning
(7)
video understanding
(4)
large vision-language model
(3)
object detection
(3)
large language model
(3)
instance segmentation
(3)
multi-modal learning
(3)
multimodal large language model
(3)
temporal consistency
(2)
benchmark evaluation
(2)
long-tailed distribution
(2)
reinforcement learning
(2)
video language model
(2)
diffusion model
(2)
vision language model
(2)
instruction following
(2)
semantic segmentation
(2)
scene text detection
(2)
neural network
(2)
Papers
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
ACL 2026
Light-A-Video: Training-free Video Relighting via Progressive Light Fusion
ICCV 2025
MM-IFEngine: Towards Multimodal Instruction Following
ICCV 2025
Visual-RFT: Visual Reinforcement Fine-Tuning
ICCV 2025
Deciphering Cross-Modal Alignment in Large Vision-Language Models via Modality Integration Rate
ICCV 2025
Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data
ICCV 2025
Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation
ICCV 2025
X-Prompt: Generalizable Auto-Regressive Visual Learning with In-Context Prompting
ICCV 2025
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree
ICCV 2025
Towards Storage-Efficient Visual Document Retrieval: An Empirical Study on Reducing Patch-Level Embeddings
ACL 2025
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation
ICML 2025
OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
CVPR 2025
WildAvatar: Learning In-the-wild 3D Avatars from the Web
CVPR 2025
Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
CVPR 2025
ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way
CVPR 2025
Conical Visual Concentration for Efficient Large Vision-Language Models
CVPR 2025
MotionClone: Training-Free Motion Cloning for Controllable Video Generation
ICLR 2025
VideoRoPE: What Makes for Good Video Rotary Position Embedding?
ICML 2025
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
ICLR 2025
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
ACL 2025
MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo
ECCV 2024
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
NIPS 2024
Are We on the Right Way for Evaluating Large Vision-Language Models?
NIPS 2024
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
NIPS 2024
MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations
NIPS 2024
Streaming Long Video Understanding with Large Language Models
NIPS 2024
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
CVPR 2024
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs
NIPS 2024
Long-CLIP: Unlocking the Long-Text Capability of CLIP
ECCV 2024
Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD Generalization
ICLR 2024
Open-Vocabulary DETR with Conditional Matching
ECCV 2022
Seesaw Loss for Long-Tailed Instance Segmentation
CVPR 2021
FASA: Feature Augmentation and Sampling Adaptation for Long-Tailed Instance Segmentation
ICCV 2021
KPNet: Towards Minimal Face Detector
AAAI 2020
Scene Text Detection with Supervised Pyramid Context Network
AAAI 2019
Efficient and Accurate Arbitrary-Shaped Text Detection With Pixel Aggregation Network
ICCV 2019