video prediction

140 papers

Explore in graph

Co-occurring keywords

video generation (703) recurrent neural network (1790) diffusion model (3720) representation learning (6174) self-supervised learning (3751) motion prediction (135) generative model (2889) unsupervised learning (3255) world model (180) convolutional neural network (4216)

Papers

Show Me: Unifying Instructional Image and Video Generation with Diffusion Models WACV 2026

RAPTOR: Real-Time High-Resolution UAV Video Prediction with Efficient Video Attention AAAI 2026

H-GAR: A Hierarchical Interaction Framework via Goal-Driven Observation-Action Refinement for Robotic Manipulation AAAI 2026

Human2Robot: Learning Robot Actions from Paired Human-Robot Videos AAAI 2026

Unified Video Action Model RSS 2025

Diffusion-Based Imaginative Coordination for Bimanual Manipulation ICCV 2025

MoMaps: Semantics-Aware Scene Motion Generation with Motion Maps ICCV 2025

DFDNet: Disentangling and Filtering Dynamics for Enhanced Video Prediction AAAI 2025

STDD: Spatio-Temporal Dual Diffusion for Video Generation CVPR 2025

GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control CVPR 2025

STLight: A Fully Convolutional Approach for Efficient Predictive Learning by Spatio-Temporal Joint Processing WACV 2025

PhysGen3D: Crafting a Miniature Interactive World from a Single Image CVPR 2025

Aether: Geometric-Aware Unified World Modeling ICCV 2025

AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction ICCV 2025

Top-Down Guidance for Learning Object-Centric Representations IJCAI 2025

OCK: Unsupervised Dynamic Video Prediction with Object-Centric Kinematics ICCV 2025

Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning ICCV 2025

SyncVP: Joint Diffusion for Synchronous Multi-Modal Video Prediction CVPR 2025

MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction CVPR 2025

Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better CVPR 2025

Learning from Streaming Video with Orthogonal Gradients CVPR 2025

CAGE: Unsupervised Visual Composition and Animation for Controllable Video Generation AAAI 2025

PredToken: Predicting Unknown Tokens and Beyond with Coarse-to-Fine Iterative Decoding CVPR 2024

Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability NIPS 2024

iVideoGPT: Interactive VideoGPTs are Scalable World Models NIPS 2024