Saining Xie
52 papers · 2015–2025 · 8 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+11 more ↓ Show less ↑
๐ Conference Polyglot (8) ๐ Academic Marathon (10) ๐ Interdisciplinary Bridge ๐งญ Keyword Pioneer ๐ Cross-Pollinator (8)
๐
Conference Polyglot
(8)
๐ฃ
Hot Topic Early Bird
๐
Academic Marathon
(10)
๐
Keyword Champion
(2)
๐ฅ
Unstoppable
(9)
โก
Prolific Year
(14)
๐
Conference Pioneer
๐
Century Club
(52)
โ
The Questioner
(4)
๐
Trend Setter
๐๏ธ
Keyword Collector
(169)
Conferences
CVPR (18)
ICCV (12)
ECCV (7)
ICLR (7)
ICML (3)
NIPS (3)
AISTATS (1)
EMNLP (1)
Top co-authors
Keywords
self-supervised learning
(8)
contrastive learning
(7)
convolutional neural network
(6)
representation learning
(5)
transfer learning
(5)
image classification
(5)
semantic segmentation
(4)
generative model
(4)
multimodal large language model
(4)
multimodal learning
(4)
large language model
(4)
vision transformer
(3)
neural architecture search
(3)
visual grounding
(3)
vision language model
(3)
vision-language model
(3)
image generation
(3)
visual representation learning
(3)
autoregressive model
(2)
diffusion model
(2)
Papers
PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop
ICML 2025
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
ICML 2025
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
ICLR 2025
DiffusionGuard: A Robust Defense Against Malicious Diffusion-based Image Editing
ICLR 2025
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
CVPR 2025
Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis
CVPR 2025
Scaling Inference Time Compute for Diffusion Models
CVPR 2025
On Scaling Up 3D Gaussian Splatting Training
ICLR 2025
Deconstructing Denoising Diffusion Models for Self-Supervised Learning
ICLR 2025
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
ICLR 2025
Scaling Language-Free Visual Representation Learning
ICCV 2025
MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
ICCV 2025
REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers
ICCV 2025
Science-T2I: Addressing Scientific Illusions in Image Synthesis
CVPR 2025
Fast Encoding and Decoding for Implicit Video Representation
ECCV 2024
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
NIPS 2024
Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning
NIPS 2024
Image Sculpting: Precise Object Editing with 3D Geometry Control
CVPR 2024
MoDE: CLIP Data Experts via Clustering
CVPR 2024
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
CVPR 2024
V?: Guided Visual Search as a Core Mechanism in Multimodal LLMs
CVPR 2024
V-IRL: Grounding Virtual Intelligence in Real Life
ECCV 2024
SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers
ECCV 2024
Altogether: Image Captioning via Re-aligning Alt-text
EMNLP 2024
Demystifying CLIP Data
ICLR 2024
CiT: Curation in Training for Effective Vision-Language Data
ICCV 2023
ConvNeXt V2: Co-Designing and Scaling ConvNets With Masked Autoencoders
CVPR 2023
Scalable Diffusion Models with Transformers
ICCV 2023
Going Denser with Open-Vocabulary Part Segmentation
ICCV 2023
SLIP: Self-Supervision Meets Language-Image Pre-training
ECCV 2022
A ConvNet for the 2020s
CVPR 2022
Masked Autoencoders Are Scalable Vision Learners
CVPR 2022
Masked Feature Prediction for Self-Supervised Visual Pre-Training
CVPR 2022
Exploring Data-Efficient 3D Scene Understanding With Contrastive Scene Contexts
CVPR 2021
Pri3D: Can 3D Priors Help 2D Representation Learning?
ICCV 2021
An Empirical Study of Training Self-Supervised Vision Transformers
ICCV 2021
On Interaction Between Augmentations and Corruptions in Natural Corruption Robustness
NIPS 2021
Graph Structure of Neural Networks
ICML 2020
FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions
CVPR 2020
Momentum Contrast for Unsupervised Visual Representation Learning
CVPR 2020
PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding
ECCV 2020
Are Labels Necessary for Neural Architecture Search?
ECCV 2020
Decoupling Representation and Classifier for Long-Tailed Recognition
ICLR 2020
Order-Aware Generative Modeling Using the 3D-Craft Dataset
ICCV 2019
Exploring Randomly Wired Neural Networks for Image Recognition
ICCV 2019
On Network Design Spaces for Visual Recognition
ICCV 2019
Attentional ShapeContextNet for Point Cloud Recognition
CVPR 2018
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification
ECCV 2018
Aggregated Residual Transformations for Deep Neural Networks
CVPR 2017
Hyper-Class Augmented and Regularized Deep Learning for Fine-Grained Image Classification
CVPR 2015
Holistically-Nested Edge Detection
ICCV 2015
Deeply-Supervised Nets
AISTATS 2015