Papers
UTC: A Unified Transformer With Inter-Task Contrastive Learning for Visual Dialog
Cheng Chen, Zhenshan Tan, Qingrong Cheng et al.
V2C: Visual Voice Cloning
Qi Chen, Mingkui Tan, Yuankai Qi et al.
VALHALLA: Visual Hallucination for Machine Translation
Yi Li, Rameswar Panda, Yoon Kim et al.
vCLIMB: A Novel Video Class Incremental Learning Benchmark
Andrés Villa, Kumail Alhamoud, Victor Escorcia et al.
V-Doc: Visual Questions Answers With Documents
Yihao Ding, Zhe Huang, Runlin Wang et al.
Vector Quantized Diffusion Model for Text-to-Image Synthesis
Shuyang Gu, Dong Chen, Jianmin Bao et al.
Vehicle Trajectory Prediction Works, but Not Everywhere
Mohammadhossein Bahari, Saeed Saadatnejad, Ahmad Rahimi et al.
Versatile Multi-Modal Pre-Training for Human-Centric Perception
Fangzhou Hong, Liang Pan, Zhongang Cai et al.
VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning
Wenjia Xu, Yongqin Xian, Jiuniu Wang et al.
Video Demoireing With Relation-Based Temporal Consistency
Peng Dai, Xin Yu, Lan Ma et al.
Video Frame Interpolation Transformer
Zhihao Shi, Xiangyu Xu, Xiaohong Liu et al.
Video Frame Interpolation With Transformer
Liying Lu, Ruizheng Wu, Huaijia Lin et al.
VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution
Zeyuan Chen, Yinbo Chen, Jingwen Liu et al.
Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation
Xiangtai Li, Wenwei Zhang, Jiangmiao Pang et al.
Video Shadow Detection via Spatio-Temporal Interpolation Consistency Training
Xiao Lu, Yihong Cao, Sheng Liu et al.
Video Swin Transformer
Ze Liu, Jia Ning, Yue Cao et al.
Video-Text Representation Learning via Differentiable Weak Temporal Alignment
Dohwan Ko, Joonmyung Choi, Juyeon Ko et al.
ViM: Out-of-Distribution With Virtual-Logit Matching
Haoqi Wang, Zhizhong Li, Litong Feng et al.
Virtual Correspondence: Humans as a Cue for Extreme-View Geometry
Wei-Chiu Ma, Anqi Joyce Yang, Shenlong Wang et al.
Virtual Elastic Objects
Hsiao-yu Chen, Edith Tretschk, Tuur Stuyck et al.
VisCUIT: Visual Auditor for Bias in CNN Image Classifier
Seongmin Lee, Zijie J. Wang, Judy Hoffman et al.
Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline
Pengyu Zhang, Jie Zhao, Dong Wang et al.
Vision-Language Pre-Training for Boosting Scene Text Detectors
Sibo Song, Jianqiang Wan, Zhibo Yang et al.
Vision-Language Pre-Training With Triple Contrastive Learning
Jinyu Yang, Jiali Duan, Son Tran et al.
Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space
Arnav Chavan, Zhiqiang Shen, Zhuang Liu et al.