Papers
Variational Degeneration to Structural Refinement: A Unified Framework for Superimposed Image Decomposition
Wenyu Li, Yan Xu, Yang Yang et al.
Verbs in Action: Improving Verb Understanding in Video-Language Models
Liliane Momeni, Mathilde Caron, Arsha Nagrani et al.
VeRi3D: Generative Vertex-based Radiance Fields for 3D Controllable Human Image Synthesis
Xinya Chen, Jiaxin Huang, Yanrui Bin et al.
Versatile Diffusion: Text, Images and Variations All in One Diffusion Model
Xingqian Xu, Zhangyang Wang, Gong Zhang et al.
VertexSerum: Poisoning Graph Neural Networks for Link Inference
Ruyi Ding, Shijin Duan, Xiaolin Xu et al.
V-FUSE: Volumetric Depth Map Fusion with Long-Range Constraints
Nathaniel Burgdorfer, Philippos Mordohai
Video Action Recognition with Attentive Semantic Units
Yifei Chen, Dapeng Chen, Ruijin Liu et al.
Video Action Segmentation via Contextually Refined Temporal Keypoints
Borui Jiang, Yang Jin, Zhentao Tan et al.
Video Anomaly Detection via Sequentially Learning Multiple Pretext Tasks
Chenrui Shi, Che Sun, Yuwei Wu et al.
Video Background Music Generation: Dataset, Method and Evaluation
Le Zhuo, Zhaokai Wang, Baisen Wang et al.
VideoFlow: Exploiting Temporal Cues for Multi-frame Optical Flow Estimation
Xiaoyu Shi, Zhaoyang Huang, Weikang Bian et al.
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
Syed Talal Wasim, Muhammad Uzair Khattak, Muzammal Naseer et al.
Video Object Segmentation-aware Video Frame Interpolation
Jun-Sang Yoo, Hongjae Lee, Seung-Won Jung
Video OWL-ViT: Temporally-consistent Open-world Localization in Video
Georg Heigold, Matthias Minderer, Alexey Gritsenko et al.
Video State-Changing Object Segmentation
Jiangwei Yu, Xiang Li, Xinran Zhao et al.
Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
Thomas E. Huang, Yifan Liu, Luc Van Gool et al.
VidStyleODE: Disentangled Video Editing via StyleGAN and NeuralODEs
Moayed Haji Ali, Andrew Bond, Tolga Birdal et al.
View Consistent Purification for Accurate Cross-View Localization
Shan Wang, Yanhao Zhang, Akhil Perincherry et al.
Viewing Graph Solvability in Practice
Federica Arrigoni, Tomas Pajdla, Andrea Fusiello
ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding
Zoey Guo, Yiwen Tang, Ray Zhang et al.
Viewset Diffusion: (0-)Image-Conditioned 3D Generative Models from 2D Data
Stanislaw Szymanowicz, Christian Rupprecht, Andrea Vedaldi
ViLLA: Fine-Grained Vision-Language Representation Learning from Real-World Data
Maya Varma, Jean-Benoit Delbrouck, Sarah Hooper et al.
ViLTA: Enhancing Vision-Language Pre-training through Textual Augmentation
Weihan Wang, Zhen Yang, Bin Xu et al.
ViM: Vision Middleware for Unified Downstream Transferring
Yutong Feng, Biao Gong, Jianwen Jiang et al.
VI-Net: Boosting Category-level 6D Object Pose Estimation via Learning Decoupled Rotations on the Spherical Representations
Jiehong Lin, Zewei Wei, Yabin Zhang et al.