Papers
Variational Distribution Learning for Unsupervised Text-to-Image Generation
Minsoo Kang, Doyup Lee, Jiseob Kim et al.
VDN-NeRF: Resolving Shape-Radiance Ambiguity via View-Dependence Normalization
Bingfan Zhu, Yanchao Yang, Xulong Wang et al.
VecFontSDF: Learning To Reconstruct and Synthesize High-Quality Vector Fonts via Signed Distance Functions
Zeqing Xia, Bojun Xiong, Zhouhui Lian
VectorFloorSeg: Two-Stream Graph Attention Network for Vectorized Roughcast Floorplan Segmentation
Bingchen Yang, Haiyong Jiang, Hao Pan et al.
VectorFusion: Text-to-SVG by Abstracting Pixel-Based Diffusion Models
Ajay Jain, Amber Xie, Pieter Abbeel
Vector Quantization With Self-Attention for Quality-Independent Representation Learning
Zhou Yang, Weisheng Dong, Xin Li et al.
VGFlow: Visibility Guided Flow Network for Human Reposing
Rishabh Jain, Krishna Kumar Singh, Mayur Hemani et al.
Vid2Avatar: 3D Avatar Reconstruction From Videos in the Wild via Self-Supervised Scene Decomposition
Chen Guo, Tianjian Jiang, Xu Chen et al.
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Antoine Yang, Arsha Nagrani, Paul Hongsuck Seo et al.
Video Compression With Entropy-Constrained Neural Representations
Carlos Gomes, Roberto Azevedo, Christopher Schroers
Video Dehazing via a Multi-Range Temporal Alignment Network With Physical Prior
Jiaqi Xu, Xiaowei Hu, Lei Zhu et al.
Video Event Restoration Based on Keyframes for Video Anomaly Detection
Zhiwei Yang, Jing Liu, Zhaoyang Wu et al.
VideoMAE V2: Scaling Video Masked Autoencoders With Dual Masking
Limin Wang, Bingkun Huang, Zhiyu Zhao et al.
Video Probabilistic Diffusion Models in Projected Latent Space
Sihyun Yu, Kihyuk Sohn, Subin Kim et al.
Video Test-Time Adaptation for Action Recognition
Wei Lin, Muhammad Jehanzeb Mirza, Mateusz Kozinski et al.
Video-Text As Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
Peng Jin, Jinfa Huang, Pengfei Xiong et al.
VideoTrack: Learning To Track Objects via Video Transformer
Fei Xie, Lei Chu, Jiahao Li et al.
ViewNet: A Novel Projection-Based Backbone With View Pooling for Few-Shot Point Cloud Classification
Jiajing Chen, Minmin Yang, Senem Velipasalar
Viewpoint Equivariance for Multi-View 3D Object Detection
Dian Chen, Jie Li, Vitor Guizilini et al.
VILA: Learning Image Aesthetics From User Comments With Vision-Language Pretraining
Junjie Ke, Keren Ye, Jiahui Yu et al.
ViLEM: Visual-Language Error Modeling for Image-Text Retrieval
Yuxin Chen, Zongyang Ma, Ziqi Zhang et al.
VindLU: A Recipe for Effective Video-and-Language Pretraining
Feng Cheng, Xizi Wang, Jie Lei et al.
ViP3D: End-to-End Visual Trajectory Prediction via 3D Agent Queries
Junru Gu, Chenxu Hu, Tianyuan Zhang et al.
ViPLO: Vision Transformer Based Pose-Conditioned Self-Loop Graph for Human-Object Interaction Detection
Jeeseung Park, Jin-Woo Park, Jong-Seok Lee
Virtual Occlusions Through Implicit Depth
Jamie Watson, Mohamed Sayed, Zawar Qureshi et al.