Papers
VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanding
Houlun Chen, Xin Wang, Hong Chen et al.
Verified Code Transpilation with LLMs
Sahil Bhatia, Jie Qiu, Niranjan Hasabnis et al.
Verified Safe Reinforcement Learning for Neural Network Dynamic Models
Junlin Wu, Huan Zhang, Yevgeniy Vorobeychik
VeXKD: The Versatile Integration of Cross-Modal Fusion and Knowledge Distillation for 3D Perception
Yuzhe JI, Yijie Chen, Liuqing Yang et al.
VFIMamba: Video Frame Interpolation with State Space Models
Guozhen Zhang, Chunxu Liu, Yutao Cui et al.
VHELM: A Holistic Evaluation of Vision Language Models
Tony Lee, Haoqin Tu, Chi Heem Wong et al.
Video Diffusion Models are Training-free Motion Interpreter and Controller
Zeqi Xiao, Yifan Zhou, Shuai Yang et al.
VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Kevin Qinghong Lin, Linjie Li, Difei Gao et al.
VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
Shiwei Wu, Joya Chen, Kevin Qinghong Lin et al.
VideoTetris: Towards Compositional Text-to-Video Generation
Ye Tian, Ling Yang, Haotian Yang et al.
Video Token Merging for Long Video Understanding
Seon-Ho Lee, Jue Wang, Zhikang Zhang et al.
VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation
Youpeng Wen, Junfan Lin, Yi Zhu et al.
Vidu4D: Single Generated Video to High-Fidelity 4D Reconstruction with Dynamic Gaussian Surfels
Yikai Wang, Xinzhou Wang, Zilong Chen et al.
ViLCo-Bench: VIdeo Language COntinual learning Benchmark
Tianqi Tang, Shohreh Deldari, Hao Xue et al.
Virtual Scanning: Unsupervised Non-line-of-sight Imaging from Irregularly Undersampled Transients
Xingyu Cui, Huanjing Yue, Song Li et al.
VISA: Variational Inference with Sequential Sample-Average Approximations
Heiko Zimmermann, Christian A. Naesseth, Jan-Willem van de Meent
Vision Foundation Model Enables Generalizable Object Pose Estimation
Kai Chen, Yiyao Ma, Xingyu Lin et al.
Vision-Language Models are Strong Noisy Label Detectors
Tong Wei, Hao-Tian Li, Chun-Shu Li et al.
Vision-Language Navigation with Energy-Based Policy
Rui Liu, Wenguan Wang, Yi Yang
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Jiannan Wu, Muyan Zhong, Sen Xing et al.
Vision Mamba Mender
Jiacong Hu, Anda Cao, Zunlei Feng et al.
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
Chenyu Yang, Xizhou Zhu, Jinguo Zhu et al.
Vision Transformer Neural Architecture Search for Out-of-Distribution Generalization: Benchmark and Insights
Sy-Tuyen Ho, Tuan Van Vo, Somayeh Ebrahimkhani et al.
VisMin: Visual Minimal-Change Understanding
Rabiul Awal, Saba Ahmadi, Le Zhang et al.