Papers
Visual Programming: Compositional Visual Reasoning Without Training
Tanmay Gupta, Aniruddha Kembhavi
Visual Prompt Multi-Modal Tracking
Jiawen Zhu, Simiao Lai, Xin Chen et al.
Visual Prompt Tuning for Generative Transfer Learning
Kihyuk Sohn, Huiwen Chang, José Lezama et al.
Visual Query Tuning: Towards Effective Usage of Intermediate Representations for Parameter and Memory Efficient Transfer Learning
Cheng-Hao Tu, Zheda Mai, Wei-Lun Chao
Visual Recognition by Request
Chufeng Tang, Lingxi Xie, Xiaopeng Zhang et al.
Visual Recognition-Driven Image Restoration for Multiple Degradation With Intrinsic Semantics Recovery
Zizheng Yang, Jie Huang, Jiahao Chang et al.
Visual-Tactile Sensing for In-Hand Object Reconstruction
Wenqiang Xu, Zhenjun Yu, Han Xue et al.
Vita-CLIP: Video and Text Adaptive CLIP via Multimodal Prompting
Syed Talal Wasim, Muzammal Naseer, Salman Khan et al.
ViTs for SITS: Vision Transformers for Satellite Image Time Series
Michail Tarasiou, Erik Chavez, Stefanos Zafeiriou
VIVE3D: Viewpoint-Independent Video Editing Using 3D-Aware GANs
Anna Frühstück, Nikolaos Sarafianos, Yuanlu Xu et al.
VLPD: Context-Aware Pedestrian Detection via Vision-Language Semantic Self-Supervision
Mengyin Liu, Jie Jiang, Chao Zhu et al.
VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud
Ziqin Wang, Bowen Cheng, Lichen Zhao et al.
vMAP: Vectorised Object Mapping for Neural Field SLAM
Xin Kong, Shikun Liu, Marwan Taher et al.
VNE: An Effective Method for Improving Deep Representation by Manipulating Eigenvalue Distribution
Jaeill Kim, Suhyun Kang, Duhun Hwang et al.
VolRecon: Volume Rendering of Signed Ray Distance Functions for Generalizable Multi-View Reconstruction
Yufan Ren, Fangjinhua Wang, Tong Zhang et al.
VoP: Text-Video Co-Operative Prompt Tuning for Cross-Modal Retrieval
Siteng Huang, Biao Gong, Yulin Pan et al.
VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking
Yukang Chen, Jianhui Liu, Xiangyu Zhang et al.
VoxFormer: Sparse Voxel Transformer for Camera-Based 3D Semantic Scene Completion
Yiming Li, Zhiding Yu, Christopher Choy et al.
VQACL: A Novel Visual Question Answering Continual Learning Setting
Xi Zhang, Feifei Zhang, Changsheng Xu
Watch or Listen: Robust Audio-Visual Speech Recognition With Visual Corruption Modeling and Reliability Scoring
Joanna Hong, Minsu Kim, Jeongsoo Choi et al.
Wavelet Diffusion Models Are Fast and Scalable Image Generators
Hao Phung, Quan Dao, Anh Tran
Weakly Supervised Class-Agnostic Motion Prediction for Autonomous Driving
Ruibo Li, Hanyu Shi, Ziang Fu et al.
Weakly-Supervised Domain Adaptive Semantic Segmentation With Prototypical Contrastive Learning
Anurag Das, Yongqin Xian, Dengxin Dai et al.
Weakly Supervised Monocular 3D Object Detection Using Multi-View Projection and Direction Consistency
Runzhou Tao, Wencheng Han, Zhongying Qiu et al.
Weakly Supervised Posture Mining for Fine-Grained Classification
Zhenchao Tang, Hualin Yang, Calvin Yu-Chian Chen