Papers
ViperGPT: Visual Inference via Python Execution for Reasoning
Dídac Surís, Sachit Menon, Carl Vondrick
Virtual Try-On with Pose-Garment Keypoints Guided Inpainting
Zhi Li, Pengfei Wei, Xiang Yin et al.
Visible-Infrared Person Re-Identification via Semantic Alignment and Affinity Inference
Xingye Fang, Yang Yang, Ying Fu
Vision Grid Transformer for Document Layout Analysis
Cheng Da, Chuwei Luo, Qi Zheng et al.
Vision HGNN: An Image is More than a Graph of Nodes
Yan Han, Peihao Wang, Souvik Kundu et al.
Vision Relation Transformer for Unbiased Scene Graph Generation
Gopika Sudhakaran, Devendra Singh Dhami, Kristian Kersting et al.
Vision Transformer Adapters for Generalizable Multitask Learning
Deblina Bhattacharjee, Sabine Süsstrunk, Mathieu Salzmann
Visual Explanations via Iterated Integrated Attributions
Oren Barkan, Yehonatan Elisha, Yuval Asher et al.
Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World
Qifan Yu, Juncheng Li, Yu Wu et al.
Visual Traffic Knowledge Graph Generation from Scene Images
Yunfei Guo, Fei Yin, Xiao-hui Li et al.
VL-Match: Enhancing Vision-Language Pretraining with Token-Level and Instance-Level Matching
Junyu Bi, Daixuan Cheng, Ping Yao et al.
VLN-PETL: Parameter-Efficient Transfer Learning for Vision-and-Language Navigation
Yanyuan Qiao, Zheng Yu, Qi Wu
VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control
Zi-Yuan Hu, Yanyang Li, Michael R. Lyu et al.
VLSlice: Interactive Vision-and-Language Slice Discovery
Eric Slyman, Minsuk Kahng, Stefan Lee
VoroMesh: Learning Watertight Surface Meshes with Voronoi Diagrams
Nissim Maruani, Roman Klokov, Maks Ovsjanikov et al.
Vox-E: Text-Guided Voxel Editing of 3D Objects
Etai Sella, Gal Fiebelman, Peter Hedman et al.
VQ3D: Learning a 3D-Aware Generative Model on ImageNet
Kyle Sargent, Jing Yu Koh, Han Zhang et al.
VQA-GNN: Reasoning with Multimodal Knowledge via Graph Neural Networks for Visual Question Answering
Yanan Wang, Michihiro Yasunaga, Hongyu Ren et al.
VQA Therapy: Exploring Answer Differences by Visually Grounding Answers
Chongyan Chen, Samreen Anjum, Danna Gurari
Waffling Around for Performance: Visual Classification with Random Words and Broad Concepts
Karsten Roth, Jae Myung Kim, A. Sophia Koepke et al.
WALDO: Future Video Synthesis Using Object Layer Decomposition and Parametric Flow Prediction
Guillaume Le Moing, Jean Ponce, Cordelia Schmid
Walking Your LiDOG: A Journey Through Multiple Domains for LiDAR Semantic Segmentation
Cristiano Saltori, Aljosa Osep, Elisa Ricci et al.
WaterMask: Instance Segmentation for Underwater Imagery
Shijie Lian, Hua Li, Runmin Cong et al.
WaveIPT: Joint Attention and Flow Alignment in the Wavelet domain for Pose Transfer
Liyuan Ma, Tingwei Gao, Haitian Jiang et al.