Sergey Tulyakov

88 papers · 2015–2025 · 11 conferences · across top CS/AI conferences

Achievements

+15 more ↓

🏃 Academic Marathon (10) 🌍 Conference Polyglot (11) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (5)

🐝 Cross-Pollinator (5) 🌈 Renaissance Researcher (8) 🗺️ Taxonomy Completionist (83) 🏠 Conference Loyalist (39) 🤝 Dynamic Duo (39) 👑 Triple Crown 🏆 Keyword Champion (20) 🏆 Grand Slam 🔬 Deep Specialist (25) 💎 Century Club (88) ⚡ Prolific Year (18) 🗃️ Keyword Collector (352) 🔥 Unstoppable (8) 📈 Trend Setter ❓ The Questioner

Conferences

CVPR (39) ICLR (11) NIPS (11) ICCV (10) ECCV (8) ICML (3) WACV (2) AAAI (1) ACL (1) EMNLP (1) NAACL (1)

Top co-authors

Jian Ren (39) Aliaksandr Siarohin (32) Hsin-Ying Lee (23) Ivan Skorokhodov (20) Willi Menapace (19) Yanyu Li (13) Menglei Chai (12) Kyle Olszewski (11) Chaoyang Wang (10) Yanzhi Wang (10)

Research topics

Architectures (1) Core AI (1)

Keywords

video generation (20) diffusion model (18) image generation (9) novel view synthesis (8) multimodal learning (7) 3d reconstruction (6) video diffusion (5) generative model (5) neural rendering (5) knowledge distillation (5) model compression (4) generative adversarial network (4) diffusion transformer (4) text-to-image generation (4) video synthesis (4) unsupervised learning (4) neural radiance field (4) volumetric rendering (4) text-to-video generation (3) 3d generation (3)

Papers

SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training CVPR 2025 Can Text-to-Video Generation help Video-Language Alignment? CVPR 2025 4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion CVPR 2025 Wonderland: Navigating 3D Scenes from a Single Image CVPR 2025 Multi-subject Open-set Personalization in Video Generation CVPR 2025 AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers CVPR 2025 SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device CVPR 2025 Omni-ID: Holistic Identity Representation Designed for Generative Tasks CVPR 2025 Mind the Time: Temporally-Controlled Multi-Event Video Generation CVPR 2025 Video Motion Transfer with Diffusion Transformers CVPR 2025 DELTA: DENSE EFFICIENT LONG-RANGE 3D TRACKING FOR ANY VIDEO ICLR 2025 GTR: Improving Large 3D Reconstruction Models through Geometry and Texture Refinement ICLR 2025 Lightweight Predictive 3D Gaussian Splats ICLR 2025 Scalable Ranked Preference Optimization for Text-to-Image Generation ICCV 2025 T2Bs: Text-to-Character Blendshapes via Video Generation ICCV 2025 AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation ICCV 2025 MaskControl: Spatio-Temporal Control for Masked Motion Synthesis ICCV 2025 Improving the Diffusability of Autoencoders ICML 2025 I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models ICML 2025 VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control ICLR 2025 TextCraftor: Your Text Encoder Can be Image Quality Controller CVPR 2024 Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis CVPR 2024 VIMI: Grounding Video Generation through Multi-modal Instruction EMNLP 2024 Efficient Training with Denoised Neural Weights ECCV 2024 UpFusion: Novel View Diffusion from Unposed Sparse View Observations ECCV 2024 SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors CVPR 2024 Towards Text-guided 3D Scene Composition CVPR 2024 E$^2$GAN: Efficient Training of Efficient GANs for Image-to-Image Translation ICML 2024 4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models NIPS 2024 AsCAN: Asymmetric Convolution-Attention Networks for Efficient Recognition and Generation NIPS 2024 BitsFusion: 1.99 bits Weight Quantization of Diffusion Model NIPS 2024 Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers CVPR 2024 TC4D: Trajectory-Conditioned Text-to-4D Generation ECCV 2024 MyVLM: Personalizing VLMs for User-Specific Queries ECCV 2024 SF-V: Single Forward Video Generation Model NIPS 2024 Hierarchical Patch Diffusion Models for High-Resolution Video Generation CVPR 2024 Evaluating Very Long-Term Conversational Memory of LLM Agents ACL 2024 HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion ICLR 2024 Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors ICLR 2024 SPAD: Spatially Aware Multi-View Diffusers CVPR 2024 4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling CVPR 2024 Control-NeRF: Editable Feature Volumes for Scene Rendering and Manipulation WACV 2023 SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds NIPS 2023 LightSpeed: Light and Fast Neural Light Fields on Mobile Devices NIPS 2023 Autodecoding Latent 3D Diffusion Models NIPS 2023 DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-Aware Scene Synthesis CVPR 2023 Make-a-Story: Visual Memory Conditioned Consistent Story Generation CVPR 2023 SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation CVPR 2023 Invertible Neural Skinning CVPR 2023 Affection: Learning Affective Explanations for Real-World Visual Data CVPR 2023 Real-Time Neural Light Field on Mobile Devices CVPR 2023 3DAvatarGAN: Bridging Domains for Personalized Editable Avatars CVPR 2023 Unsupervised Volumetric Animation CVPR 2023 ShapeTalk: A Language Dataset and Framework for 3D Shape Edits and Deformations CVPR 2023 Rethinking Vision Transformers for MobileNet Size and Speed ICCV 2023 Text2Tex: Text-driven Texture Synthesis via Diffusion Models ICCV 2023 InfiniCity: Infinite-Scale City Synthesis ICCV 2023 Discrete Contrastive Diffusion for Cross-Modal Music and Image Generation ICLR 2023 3D generation on ImageNet ICLR 2023 StyleGAN-V: A Continuous Video Generator With the Price, Image Quality and Perks of StyleGAN2 CVPR 2022 InOut: Diverse Image Outpainting via GAN Inversion CVPR 2022 Layer Freezing & Data Sieving: Missing Pieces of a Generic Framework for Sparse Training NIPS 2022 EfficientFormer: Vision Transformers at MobileNet Speed NIPS 2022 Playable Environments: Video Manipulation in Space and Time CVPR 2022 InfinityGAN: Towards Infinite-Pixel Image Synthesis ICLR 2022 F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization ICLR 2022 EpiGRAF: Rethinking training of 3D GANs NIPS 2022 R2L: Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis ECCV 2022 Quantized GAN for Complex Music Generation from Dance Videos ECCV 2022 Cross-Modal 3D Shape Generation and Manipulation ECCV 2022 Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning CVPR 2022 A Good Image Generator Is What You Need for High-Resolution Video Synthesis ICLR 2021 SMIL: Multimodal Learning with Severely Missing Modality AAAI 2021 Task-Assisted Domain Adaptation With Anchor Tasks WACV 2021 Flow Guided Transformable Bottleneck Networks for Motion Retargeting CVPR 2021 Teachers Do More Than Teach: Compressing Image-to-Image Models CVPR 2021 Playable Video Generation CVPR 2021 Motion Representations for Articulated Animation CVPR 2021 Neural Hair Rendering ECCV 2020 Transformable Bottleneck Networks ICCV 2019 Laplace Landmark Localization ICCV 2019 Animating Arbitrary Objects via Deep Motion Transfer CVPR 2019 3D Guided Fine-Grained Face Manipulation CVPR 2019 Train One Get One Free: Partially Supervised Neural Network for Bug Report Duplicate Detection and Clustering NAACL 2019 First Order Motion Model for Image Animation NIPS 2019 MoCoGAN: Decomposing Motion and Content for Video Generation CVPR 2018 Self-Adaptive Matrix Completion for Heart Rate Estimation From Face Videos Under Realistic Conditions CVPR 2016 Regressing a 3D Face Shape From a Single Image ICCV 2015