Papers
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation
Yunhao Ge, Xiaohui Zeng, Jacob Samuel Huffman et al.
Visual In-Context Prompting
Feng Li, Qing Jiang, Hao Zhang et al.
Visual Layout Composer: Image-Vector Dual Diffusion Model for Design Layout Generation
Mohammad Amin Shabani, Zhaowen Wang, Difan Liu et al.
Visual Objectification in Films: Towards a New AI Task for Video Interpretation
Julie Tores, Lucile Sassatelli, Hui-Yin Wu et al.
Visual Point Cloud Forecasting enables Scalable Autonomous Driving
Zetong Yang, Li Chen, Yanan Sun et al.
Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models
Yushi Hu, Otilia Stretcu, Chun-Ta Lu et al.
Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
Zhihao Yuan, Jinke Ren, Chun-Mei Feng et al.
Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach
Mir Rayat Imtiaz Hossain, Mennatullah Siam, Leonid Sigal et al.
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
Jieneng Chen, Qihang Yu, Xiaohui Shen et al.
ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions
Chunlong Xia, Xinliang Wang, Feng Lv et al.
ViT-Lens: Towards Omni-modal Representations
Weixian Lei, Yixiao Ge, Kun Yi et al.
ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models
Jeong-gi Kwak, Erqun Dong, Yuhe Jin et al.
VkD: Improving Knowledge Distillation using Orthogonal Projections
Roy Miles, Ismail Elezi, Jiankang Deng
Vlogger: Make Your Dream A Vlog
Shaobin Zhuang, Kunchang Li, Xinyuan Chen et al.
VLP: Vision Language Planning for Autonomous Driving
Chenbin Pan, Burhaneddin Yaman, Tommaso Nesti et al.
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models
Hyeonho Jeong, Geon Yeong Park, Jong Chul Ye
VMINer: Versatile Multi-view Inverse Rendering with Near- and Far-field Light Sources
Fan Fei, Jiajun Tang, Ping Tan et al.
VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis
Linshan Wu, Jiaxin Zhuang, Hao Chen
Volumetric Environment Representation for Vision-Language Navigation
Rui Liu, Wenguan Wang, Yi Yang
VOODOO 3D: Volumetric Portrait Disentanglement For One-Shot 3D Head Reenactment
Phong Tran, Egor Zakharov, Long-Nhat Ho et al.
VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation
Yang Chen, Yingwei Pan, Haibo Yang et al.
VRetouchEr: Learning Cross-frame Feature Interdependence with Imperfection Flow for Face Retouching in Videos
Wen Xue, Le Jiang, Lianxin Xie et al.
VRP-SAM: SAM with Visual Reference Prompt
Yanpeng Sun, Jiahui Chen, Shan Zhang et al.
VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning
Ziyang Luo, Nian Liu, Wangbo Zhao et al.
VSRD: Instance-Aware Volumetric Silhouette Rendering for Weakly Supervised 3D Object Detection
Zihua Liu, Hiroki Sakuma, Masatoshi Okutomi