Enze Xie

49 papers · 2019–2025 · 9 conferences · across top CS/AI conferences

Achievements

+12 more ↓

🌉 Interdisciplinary Bridge 🏃 Academic Marathon (6) 🌍 Conference Polyglot (9) 🌈 Renaissance Researcher (5) 🗺️ Taxonomy Completionist (41)

🐝 Cross-Pollinator (15) 🌍 Conference Polyglot (9) 🏃 Academic Marathon (6) 🤝 Dynamic Duo (22) 🏆 Grand Slam 🔬 Deep Specialist (10) 🔥 Unstoppable (7) 💎 Century Club (49) 🚀 Conference Pioneer 🗃️ Keyword Collector (139) ⚡ Prolific Year (6) ❓ The Questioner

Conferences

ICCV (12) ICLR (12) ECCV (8) NIPS (5) CVPR (4) AAAI (3) ICML (3) ACL (1) IJCAI (1)

Top co-authors

Ping Luo (22) Zhenguo Li (18) Wenhai Wang (14) Junsong Chen (12) Song Han (9) Junyu Chen (8) Tong Lu (8) Chongjian GE (7) Han Cai (7) Ding Liang (6)

Keywords

semantic segmentation (11) diffusion model (6) object detection (5) image generation (5) instance segmentation (4) vision transformer (3) scene text detection (3) 3d object detection (3) text-to-image generation (3) dense prediction (2) latent diffusion (2) depth estimation (2) image segmentation (2) feature pyramid (2) multi-modal learning (2) autonomous driving (2) model compression (2) knowledge distillation (1) 3d shape generation (1) adversarial robustness (1)

Papers

DC-AR: Efficient Masked Autoregressive Image Generation with Deep Compression Hybrid Tokenizer ICCV 2025 SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation ICCV 2025 SVDQuant: Absorbing Outliers by Low-Rank Component for 4-Bit Diffusion Models ICLR 2025 HART: Efficient Visual Generation with Hybrid Autoregressive Transformer ICLR 2025 SANA: Efficient High-Resolution Text-to-Image Synthesis with Linear Diffusion Transformers ICLR 2025 VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation ICLR 2025 Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models ICLR 2025 SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer ICML 2025 DC-AE 1.5: Accelerating Diffusion Model Convergence with Structured Latent Space ICCV 2025 LEGO-Prover: Neural Theorem Proving with Growing Libraries ICLR 2024 DeepAccident: A Motion and Accident Prediction Benchmark for V2X Autonomous Driving AAAI 2024 Accelerating Diffusion Sampling with Optimized Time Steps CVPR 2024 PixArt-Sigma: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation ECCV 2024 Fast Training of Diffusion Transformer with Extreme Masking for 3D Point Clouds Generation ECCV 2024 "Segment, Lift and Fit: Automatic 3D Shape Labeling from 2D Prompts" ECCV 2024 Large Language Models as Automated Aligners for benchmarking Vision-Language Models ICLR 2024 DQ-LoRe: Dual Queries with Low Rank Approximation Re-ranking for In-Context Learning ICLR 2024 PixArt-$\alpha$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis ICLR 2024 MagicDrive: Street View Generation with Diverse 3D Geometry Control ICLR 2024 GeoDiffusion: Text-Prompted Geometric Control for Object Detection Data Generation ICLR 2024 T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation NIPS 2023 MetaBEV: Solving Sensor Failures for 3D Detection and Map Segmentation ICCV 2023 Beyond One-to-One: Rethinking the Referring Image Segmentation ICCV 2023 DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-efficient Fine-Tuning ICCV 2023 Parametric Depth Based Feature Representation Learning for Object Detection and Segmentation in Bird's-Eye View ICCV 2023 DDP: Diffusion Model for Dense Visual Prediction ICCV 2023 DT-Solver: Automated Theorem Proving with Dynamic-Tree Sampling Guided by Proof-level Value Function ACL 2023 DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation NIPS 2023 Flow-Based Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection NIPS 2023 DiffComplete: Diffusion-based Generative 3D Shape Completion NIPS 2023 CycleMLP: A MLP-like Architecture for Dense Prediction ICLR 2022 Understanding The Robustness in Vision Transformers ICML 2022 Panoptic SegFormer: Delving Deeper Into Panoptic Segmentation With Transformers CVPR 2022 BEVFormer: Learning Bird’s-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers ECCV 2022 Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization AAAI 2022 Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction Without Convolutions ICCV 2021 Watch Only Once: An End-to-End Video Action Detection Framework ICCV 2021 DetCo: Unsupervised Contrastive Learning for Object Detection ICCV 2021 What Makes for End-to-End Object Detection? ICML 2021 SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers NIPS 2021 Segmenting Transparent Objects in the Wild with Transformer IJCAI 2021 Segmenting Transparent Objects in the Wild ECCV 2020 AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting ECCV 2020 Differentiable Hierarchical Graph Grouping for Multi-Person Pose Estimation ECCV 2020 Scene Text Image Super-resolution in the wild ECCV 2020 PolarMask: Single Shot Instance Segmentation With Polar Representation CVPR 2020 Efficient and Accurate Arbitrary-Shaped Text Detection With Pixel Aggregation Network ICCV 2019 Scene Text Detection with Supervised Pyramid Context Network AAAI 2019 Shape Robust Text Detection With Progressive Scale Expansion Network CVPR 2019