conftrace_

Yuhang Zang

36 papers · 2019–2026 · 8 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓
+12 more ↓ 🧭 Keyword Pioneer πŸ—ΊοΈ Taxonomy Completionist (10) πŸŒ‰ Interdisciplinary Bridge 🌈 Renaissance Researcher (5) 🌍 Conference Polyglot (8)
πŸŒ‰ Interdisciplinary Bridge πŸ—ΊοΈ Taxonomy Completionist (10) 🧭 Keyword Pioneer 🀝 Dynamic Duo (26) πŸ† Grand Slam πŸ‘₯ Mega-Team (24) πŸ”¬ Deep Specialist (12) πŸ—ƒοΈ Keyword Collector (160) ⚑ Prolific Year (10) ❓ The Questioner (3) πŸ’Ž Century Club (35) πŸš€ Conference Pioneer

Conferences

ICCV (10) CVPR (7) NIPS (6) ACL (3) ECCV (3) ICLR (3) AAAI (2) ICML (2)

Papers

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing ACL 2026 Light-A-Video: Training-free Video Relighting via Progressive Light Fusion ICCV 2025 MM-IFEngine: Towards Multimodal Instruction Following ICCV 2025 Visual-RFT: Visual Reinforcement Fine-Tuning ICCV 2025 Deciphering Cross-Modal Alignment in Large Vision-Language Models via Modality Integration Rate ICCV 2025 Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data ICCV 2025 Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation ICCV 2025 X-Prompt: Generalizable Auto-Regressive Visual Learning with In-Context Prompting ICCV 2025 SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree ICCV 2025 Towards Storage-Efficient Visual Document Retrieval: An Empirical Study on Reducing Patch-Level Embeddings ACL 2025 SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation ICML 2025 OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding? CVPR 2025 WildAvatar: Learning In-the-wild 3D Avatars from the Web CVPR 2025 Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction CVPR 2025 ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way CVPR 2025 Conical Visual Concentration for Efficient Large Vision-Language Models CVPR 2025 MotionClone: Training-Free Motion Cloning for Controllable Video Generation ICLR 2025 VideoRoPE: What Makes for Good Video Rotary Position Embedding? ICML 2025 MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models ICLR 2025 InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model ACL 2025 MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo ECCV 2024 ShareGPT4Video: Improving Video Understanding and Generation with Better Captions NIPS 2024 Are We on the Right Way for Evaluating Large Vision-Language Models? NIPS 2024 InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD NIPS 2024 MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations NIPS 2024 Streaming Long Video Understanding with Large Language Models NIPS 2024 Alpha-CLIP: A CLIP Model Focusing on Wherever You Want CVPR 2024 MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs NIPS 2024 Long-CLIP: Unlocking the Long-Text Capability of CLIP ECCV 2024 Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD Generalization ICLR 2024 Open-Vocabulary DETR with Conditional Matching ECCV 2022 Seesaw Loss for Long-Tailed Instance Segmentation CVPR 2021 FASA: Feature Augmentation and Sampling Adaptation for Long-Tailed Instance Segmentation ICCV 2021 KPNet: Towards Minimal Face Detector AAAI 2020 Scene Text Detection with Supervised Pyramid Context Network AAAI 2019 Efficient and Accurate Arbitrary-Shaped Text Detection With Pixel Aggregation Network ICCV 2019