Papers
Video Probabilistic Diffusion Models in Projected Latent Space
Sihyun Yu, Kihyuk Sohn, Subin Kim et al.
Video Test-Time Adaptation for Action Recognition
Wei Lin, Muhammad Jehanzeb Mirza, Mateusz Kozinski et al.
Video-Text As Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
Peng Jin, Jinfa Huang, Pengfei Xiong et al.
VideoTrack: Learning To Track Objects via Video Transformer
Fei Xie, Lei Chu, Jiahao Li et al.
ViewNet: A Novel Projection-Based Backbone With View Pooling for Few-Shot Point Cloud Classification
Jiajing Chen, Minmin Yang, Senem Velipasalar
Viewpoint Equivariance for Multi-View 3D Object Detection
Dian Chen, Jie Li, Vitor Guizilini et al.
VILA: Learning Image Aesthetics From User Comments With Vision-Language Pretraining
Junjie Ke, Keren Ye, Jiahui Yu et al.
ViLEM: Visual-Language Error Modeling for Image-Text Retrieval
Yuxin Chen, Zongyang Ma, Ziqi Zhang et al.
VindLU: A Recipe for Effective Video-and-Language Pretraining
Feng Cheng, Xizi Wang, Jie Lei et al.
ViP3D: End-to-End Visual Trajectory Prediction via 3D Agent Queries
Junru Gu, Chenxu Hu, Tianyuan Zhang et al.
ViPLO: Vision Transformer Based Pose-Conditioned Self-Loop Graph for Human-Object Interaction Detection
Jeeseung Park, Jin-Woo Park, Jong-Seok Lee
Virtual Occlusions Through Implicit Depth
Jamie Watson, Mohamed Sayed, Zawar Qureshi et al.
Virtual Sparse Convolution for Multimodal 3D Object Detection
Hai Wu, Chenglu Wen, Shaoshuai Shi et al.
VisFusion: Visibility-Aware Online 3D Scene Reconstruction From Videos
Huiyu Gao, Wei Mao, Miaomiao Liu
Visibility Aware Human-Object Interaction Tracking From Single RGB Camera
Xianghui Xie, Bharat Lal Bhatnagar, Gerard Pons-Moll
Visibility Constrained Wide-Band Illumination Spectrum Design for Seeing-in-the-Dark
Muyao Niu, Zhuoxiao Li, Zhihang Zhong et al.
Vision Transformers Are Good Mask Auto-Labelers
Shiyi Lan, Xitong Yang, Zhiding Yu et al.
Vision Transformers Are Parameter-Efficient Audio-Visual Learners
Yan-Bo Lin, Yi-Lin Sung, Jie Lei et al.
Visual Atoms: Pre-Training Vision Transformers With Sinusoidal Waves
Sora Takashima, Ryo Hayamizu, Nakamasa Inoue et al.
Visual Dependency Transformers: Dependency Tree Emerges From Reversed Attention
Mingyu Ding, Yikang Shen, Lijie Fan et al.
Visual DNA: Representing and Comparing Images Using Distributions of Neuron Activations
Benjamin Ramtoula, Matthew Gadd, Paul Newman et al.
Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving
Xiwen Liang, Minzhe Niu, Jianhua Han et al.
Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images
Ming Y. Lu, Bowen Chen, Andrew Zhang et al.
Visual-Language Prompt Tuning With Knowledge-Guided Context Optimization
Hantao Yao, Rui Zhang, Changsheng Xu
Visual Localization Using Imperfect 3D Models From the Internet
Vojtech Panek, Zuzana Kukelova, Torsten Sattler