Saining Xie

52 papers · 2015–2025 · 8 conferences · across top CS/AI conferences

Achievements

+11 more ↓

🌍 Conference Polyglot (8) 🏃 Academic Marathon (10) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (8)

🌍 Conference Polyglot (8) 🐣 Hot Topic Early Bird 🏃 Academic Marathon (10) 🏆 Keyword Champion (2) 🔥 Unstoppable (9) ⚡ Prolific Year (14) 🚀 Conference Pioneer 💎 Century Club (52) ❓ The Questioner (4) 📈 Trend Setter 🗃️ Keyword Collector (169)

Conferences

CVPR (18) ICCV (12) ECCV (7) ICLR (7) ICML (3) NIPS (3) AISTATS (1) EMNLP (1)

Top co-authors

Kaiming He (8) Xinlei Chen (7) Zhuang Liu (6) Shengbang Tong (6) Christoph Feichtenhofer (5) Zhuowen Tu (5) Yann LeCun (5) Ross Girshick (5) Piotr Dollár (4) Jihan Yang (4)

Keywords

self-supervised learning (8) contrastive learning (7) convolutional neural network (6) representation learning (5) transfer learning (5) image classification (5) semantic segmentation (4) generative model (4) multimodal large language model (4) multimodal learning (4) large language model (4) vision transformer (3) neural architecture search (3) visual grounding (3) vision language model (3) vision-language model (3) image generation (3) visual representation learning (3) autoregressive model (2) diffusion model (2)

Papers

PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop ICML 2025 SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training ICML 2025 AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark ICLR 2025 DiffusionGuard: A Robust Defense Against Malicious Diffusion-based Image Editing ICLR 2025 Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces CVPR 2025 Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis CVPR 2025 Scaling Inference Time Compute for Diffusion Models CVPR 2025 On Scaling Up 3D Gaussian Splatting Training ICLR 2025 Deconstructing Denoising Diffusion Models for Self-Supervised Learning ICLR 2025 Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think ICLR 2025 Scaling Language-Free Visual Representation Learning ICCV 2025 MetaMorph: Multimodal Understanding and Generation via Instruction Tuning ICCV 2025 REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers ICCV 2025 Science-T2I: Addressing Scientific Illusions in Image Synthesis CVPR 2025 Fast Encoding and Decoding for Implicit Video Representation ECCV 2024 Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs NIPS 2024 Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning NIPS 2024 Image Sculpting: Precise Object Editing with 3D Geometry Control CVPR 2024 MoDE: CLIP Data Experts via Clustering CVPR 2024 Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs CVPR 2024 V?: Guided Visual Search as a Core Mechanism in Multimodal LLMs CVPR 2024 V-IRL: Grounding Virtual Intelligence in Real Life ECCV 2024 SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers ECCV 2024 Altogether: Image Captioning via Re-aligning Alt-text EMNLP 2024 Demystifying CLIP Data ICLR 2024 CiT: Curation in Training for Effective Vision-Language Data ICCV 2023 ConvNeXt V2: Co-Designing and Scaling ConvNets With Masked Autoencoders CVPR 2023 Scalable Diffusion Models with Transformers ICCV 2023 Going Denser with Open-Vocabulary Part Segmentation ICCV 2023 SLIP: Self-Supervision Meets Language-Image Pre-training ECCV 2022 A ConvNet for the 2020s CVPR 2022 Masked Autoencoders Are Scalable Vision Learners CVPR 2022 Masked Feature Prediction for Self-Supervised Visual Pre-Training CVPR 2022 Exploring Data-Efficient 3D Scene Understanding With Contrastive Scene Contexts CVPR 2021 Pri3D: Can 3D Priors Help 2D Representation Learning? ICCV 2021 An Empirical Study of Training Self-Supervised Vision Transformers ICCV 2021 On Interaction Between Augmentations and Corruptions in Natural Corruption Robustness NIPS 2021 Graph Structure of Neural Networks ICML 2020 FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions CVPR 2020 Momentum Contrast for Unsupervised Visual Representation Learning CVPR 2020 PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding ECCV 2020 Are Labels Necessary for Neural Architecture Search? ECCV 2020 Decoupling Representation and Classifier for Long-Tailed Recognition ICLR 2020 Order-Aware Generative Modeling Using the 3D-Craft Dataset ICCV 2019 Exploring Randomly Wired Neural Networks for Image Recognition ICCV 2019 On Network Design Spaces for Visual Recognition ICCV 2019 Attentional ShapeContextNet for Point Cloud Recognition CVPR 2018 Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification ECCV 2018 Aggregated Residual Transformations for Deep Neural Networks CVPR 2017 Hyper-Class Augmented and Regularized Deep Learning for Fine-Grained Image Classification CVPR 2015 Holistically-Nested Edge Detection ICCV 2015 Deeply-Supervised Nets AISTATS 2015