Zilong Huang

24 papers · 2017–2025 · 7 conferences · across top CS/AI conferences

Achievements

+8 more ↓

🏃 Academic Marathon (8) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌍 Conference Polyglot (7) 🐝 Cross-Pollinator (10)

🐝 Cross-Pollinator (10) 🗺️ Taxonomy Completionist (43) 🧬 Topic Evolution 🚀 Conference Pioneer 🔥 Unstoppable (9) 🗃️ Keyword Collector (115) 💎 Century Club (24) ⚡ Prolific Year (5)

Conferences

CVPR (9) ICCV (6) NIPS (3) AAAI (2) ICLR (2) INTERSPEECH (1) WACV (1)

Top co-authors

Jiashi Feng (8) Xinggang Wang (6) Gang Yu (5) Yunchao Wei (5) Jun Hao Liew (4) Bingyi Kang (4) Tao Chen (3) Wenyu Liu (3) Thomas S. Huang (2) Honghui Shi (2)

Keywords

semantic segmentation (9) representation learning (3) depth estimation (3) monocular depth (3) image classification (2) self-supervised learning (2) vision transformer (2) human parsing (2) metric depth (2) convolutional neural network (2) monocular depth estimation (2) diffusion model (2) image matting (2) image generation (2) attention mechanism (1) object detection (1) computer vision (1) zero-shot learning (1) domain adaptation (1) contextual information (1)

Papers

Scene4U: Hierarchical Layered 3D Scene Reconstruction from Single Panoramic Image for Your Immerse Exploration CVPR 2025 DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention CVPR 2025 Video Depth Anything: Consistent Depth Estimation for Super-Long Videos CVPR 2025 LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models ICLR 2025 GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation ICCV 2025 The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer ICCV 2025 QK-Edit: Revisiting Attention-based Injection in MM-DiT for Image and Video Editing ICCV 2025 Disentangled Pre-Training for Image Matting WACV 2024 Depth Anything V2 NIPS 2024 Classification Done Right for Vision-Language Pre-Training NIPS 2024 Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data CVPR 2024 MM-NodeFormer: Node Transformer Multimodal Fusion for Emotion Recognition in Conversation INTERSPEECH 2024 SeaFormer: Squeeze-enhanced Axial Transformer for Mobile Semantic Segmentation ICLR 2023 Executing Your Commands via Motion Diffusion in Latent Space CVPR 2023 TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation CVPR 2022 Coordinates Are NOT Lonely - Codebook Prior Helps Implicit Neural 3D representations NIPS 2022 High-Resolution Deep Image Matting AAAI 2021 Human De-Occlusion: Invisible Perception and Recovery for Humans CVPR 2021 Agriculture-Vision: A Large Aerial Image Database for Agricultural Pattern Analysis CVPR 2020 SPGNet: Semantic Prediction Guidance for Scene Parsing ICCV 2019 Devil in the Details: Towards Accurate Single and Multiple Human Parsing AAAI 2019 CCNet: Criss-Cross Attention for Semantic Segmentation ICCV 2019 Weakly-Supervised Semantic Segmentation Network With Deep Seeded Region Growing CVPR 2018 Object-Level Proposals ICCV 2017