Zilong Huang
24 papers · 2017–2025 · 7 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+8 more ↓ Show less ↑
π Academic Marathon (8) π Interdisciplinary Bridge π§ Keyword Pioneer π Conference Polyglot (7) π Cross-Pollinator (10)
π
Cross-Pollinator
(10)
πΊοΈ
Taxonomy Completionist
(43)
π§¬
Topic Evolution
π
Conference Pioneer
π₯
Unstoppable
(9)
ποΈ
Keyword Collector
(115)
π
Century Club
(24)
β‘
Prolific Year
(5)
Conferences
CVPR (9)
ICCV (6)
NIPS (3)
AAAI (2)
ICLR (2)
INTERSPEECH (1)
WACV (1)
Top co-authors
Keywords
semantic segmentation
(9)
representation learning
(3)
depth estimation
(3)
monocular depth
(3)
image classification
(2)
self-supervised learning
(2)
vision transformer
(2)
human parsing
(2)
metric depth
(2)
convolutional neural network
(2)
monocular depth estimation
(2)
diffusion model
(2)
image matting
(2)
image generation
(2)
attention mechanism
(1)
object detection
(1)
computer vision
(1)
zero-shot learning
(1)
domain adaptation
(1)
contextual information
(1)
Papers
Scene4U: Hierarchical Layered 3D Scene Reconstruction from Single Panoramic Image for Your Immerse Exploration
CVPR 2025
DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention
CVPR 2025
Video Depth Anything: Consistent Depth Estimation for Super-Long Videos
CVPR 2025
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
ICLR 2025
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation
ICCV 2025
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
ICCV 2025
QK-Edit: Revisiting Attention-based Injection in MM-DiT for Image and Video Editing
ICCV 2025
Disentangled Pre-Training for Image Matting
WACV 2024
Depth Anything V2
NIPS 2024
Classification Done Right for Vision-Language Pre-Training
NIPS 2024
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
CVPR 2024
MM-NodeFormer: Node Transformer Multimodal Fusion for Emotion Recognition in Conversation
INTERSPEECH 2024
SeaFormer: Squeeze-enhanced Axial Transformer for Mobile Semantic Segmentation
ICLR 2023
Executing Your Commands via Motion Diffusion in Latent Space
CVPR 2023
TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation
CVPR 2022
Coordinates Are NOT Lonely - Codebook Prior Helps Implicit Neural 3D representations
NIPS 2022
High-Resolution Deep Image Matting
AAAI 2021
Human De-Occlusion: Invisible Perception and Recovery for Humans
CVPR 2021
Agriculture-Vision: A Large Aerial Image Database for Agricultural Pattern Analysis
CVPR 2020
SPGNet: Semantic Prediction Guidance for Scene Parsing
ICCV 2019
Devil in the Details: Towards Accurate Single and Multiple Human Parsing
AAAI 2019
CCNet: Criss-Cross Attention for Semantic Segmentation
ICCV 2019
Weakly-Supervised Semantic Segmentation Network With Deep Seeded Region Growing
CVPR 2018
Object-Level Proposals
ICCV 2017