Enze Xie
49 papers · 2019–2025 · 9 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+12 more ↓ Show less ↑
π Interdisciplinary Bridge π Academic Marathon (6) π Conference Polyglot (9) π Renaissance Researcher (5) πΊοΈ Taxonomy Completionist (41)
π
Cross-Pollinator
(15)
π
Conference Polyglot
(9)
π
Academic Marathon
(6)
π€
Dynamic Duo
(22)
π
Grand Slam
π¬
Deep Specialist
(10)
π₯
Unstoppable
(7)
π
Century Club
(49)
π
Conference Pioneer
ποΈ
Keyword Collector
(139)
β‘
Prolific Year
(6)
β
The Questioner
Conferences
ICCV (12)
ICLR (12)
ECCV (8)
NIPS (5)
CVPR (4)
AAAI (3)
ICML (3)
ACL (1)
IJCAI (1)
Top co-authors
Keywords
semantic segmentation
(11)
diffusion model
(6)
object detection
(5)
image generation
(5)
instance segmentation
(4)
vision transformer
(3)
scene text detection
(3)
3d object detection
(3)
text-to-image generation
(3)
dense prediction
(2)
latent diffusion
(2)
depth estimation
(2)
image segmentation
(2)
feature pyramid
(2)
multi-modal learning
(2)
autonomous driving
(2)
model compression
(2)
knowledge distillation
(1)
3d shape generation
(1)
adversarial robustness
(1)
Papers
DC-AR: Efficient Masked Autoregressive Image Generation with Deep Compression Hybrid Tokenizer
ICCV 2025
SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation
ICCV 2025
SVDQuant: Absorbing Outliers by Low-Rank Component for 4-Bit Diffusion Models
ICLR 2025
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
ICLR 2025
SANA: Efficient High-Resolution Text-to-Image Synthesis with Linear Diffusion Transformers
ICLR 2025
VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
ICLR 2025
Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models
ICLR 2025
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer
ICML 2025
DC-AE 1.5: Accelerating Diffusion Model Convergence with Structured Latent Space
ICCV 2025
LEGO-Prover: Neural Theorem Proving with Growing Libraries
ICLR 2024
DeepAccident: A Motion and Accident Prediction Benchmark for V2X Autonomous Driving
AAAI 2024
Accelerating Diffusion Sampling with Optimized Time Steps
CVPR 2024
PixArt-Sigma: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
ECCV 2024
Fast Training of Diffusion Transformer with Extreme Masking for 3D Point Clouds Generation
ECCV 2024
"Segment, Lift and Fit: Automatic 3D Shape Labeling from 2D Prompts"
ECCV 2024
Large Language Models as Automated Aligners for benchmarking Vision-Language Models
ICLR 2024
DQ-LoRe: Dual Queries with Low Rank Approximation Re-ranking for In-Context Learning
ICLR 2024
PixArt-$\alpha$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
ICLR 2024
MagicDrive: Street View Generation with Diverse 3D Geometry Control
ICLR 2024
GeoDiffusion: Text-Prompted Geometric Control for Object Detection Data Generation
ICLR 2024
T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation
NIPS 2023
MetaBEV: Solving Sensor Failures for 3D Detection and Map Segmentation
ICCV 2023
Beyond One-to-One: Rethinking the Referring Image Segmentation
ICCV 2023
DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-efficient Fine-Tuning
ICCV 2023
Parametric Depth Based Feature Representation Learning for Object Detection and Segmentation in Bird's-Eye View
ICCV 2023
DDP: Diffusion Model for Dense Visual Prediction
ICCV 2023
DT-Solver: Automated Theorem Proving with Dynamic-Tree Sampling Guided by Proof-level Value Function
ACL 2023
DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation
NIPS 2023
Flow-Based Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection
NIPS 2023
DiffComplete: Diffusion-based Generative 3D Shape Completion
NIPS 2023
CycleMLP: A MLP-like Architecture for Dense Prediction
ICLR 2022
Understanding The Robustness in Vision Transformers
ICML 2022
Panoptic SegFormer: Delving Deeper Into Panoptic Segmentation With Transformers
CVPR 2022
BEVFormer: Learning Birdβs-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers
ECCV 2022
Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization
AAAI 2022
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction Without Convolutions
ICCV 2021
Watch Only Once: An End-to-End Video Action Detection Framework
ICCV 2021
DetCo: Unsupervised Contrastive Learning for Object Detection
ICCV 2021
What Makes for End-to-End Object Detection?
ICML 2021
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers
NIPS 2021
Segmenting Transparent Objects in the Wild with Transformer
IJCAI 2021
Segmenting Transparent Objects in the Wild
ECCV 2020
AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting
ECCV 2020
Differentiable Hierarchical Graph Grouping for Multi-Person Pose Estimation
ECCV 2020
Scene Text Image Super-resolution in the wild
ECCV 2020
PolarMask: Single Shot Instance Segmentation With Polar Representation
CVPR 2020
Efficient and Accurate Arbitrary-Shaped Text Detection With Pixel Aggregation Network
ICCV 2019
Scene Text Detection with Supervised Pyramid Context Network
AAAI 2019
Shape Robust Text Detection With Progressive Scale Expansion Network
CVPR 2019