Papers
VIRES: Video Instance Repainting via Sketch and Text Guided Generation
Shuchen Weng, Haojie Zheng, Peixuan Zhang et al.
VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning
Xueqing Wu, Yuheng Ding, Bingxuan Li et al.
VisionArena: 230k Real World User-VLM Conversations with Preference Labels
Christopher Chou, Lisa Dunlap, Koki Mashita et al.
Vision-Guided Action: Enhancing 3D Human Motion Prediction with Gaze-informed Affordance in 3D Scenes
Ting Yu, Yi Lin, Jun Yu et al.
Vision-Language Embodiment for Monocular Depth Estimation
Jinchang Zhang, Guoyu Lu
Vision-Language Gradient Descent-driven All-in-One Deep Unfolding Networks
Haijin Zeng, Xiangming Wang, Yongyong Chen et al.
Vision-Language Model IP Protection via Prompt-based Learning
Lianyu Wang, Meng Wang, Huazhu Fu et al.
Vision-Language Models Do Not Understand Negation
Kumail Alhamoud, Shaden Alshammari, Yonglong Tian et al.
VisionPAD: A Vision-Centric Pre-training Paradigm for Autonomous Driving
Haiming Zhang, Wending Zhou, Yiyao Zhu et al.
VisionZip: Longer is Better but Not Necessary in Vision Language Models
Senqiao Yang, Yukang Chen, Zhuotao Tian et al.
VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging
Yufan He, Pengfei Guo, Yucheng Tang et al.
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation
Weiming Ren, Huan Yang, Jie Min et al.
VISTREAM: Improving Computation Efficiency of Visual Streaming Perception via Law-of-Charge-Conservation Inspired Spiking Neural Network
Kang You, Ziling Wei, Jing Yan et al.
Visual Agentic AI for Spatial Reasoning with a Dynamic API
Damiano Marsili, Rohun Agrawal, Yisong Yue et al.
Visual and Semantic Prompt Collaboration for Generalized Zero-Shot Learning
Huajie Jiang, Zhengxian Li, Xiaohan Yu et al.
Visual Consensus Prompting for Co-Salient Object Detection
Jie Wang, Nana Yu, Zihao Zhang et al.
Visual-Instructed Degradation Diffusion for All-in-One Image Restoration
Wenyang Luo, Haina Qin, Zewen Chen et al.
Visual Lexicon: Rich Image Features in Language Space
XuDong Wang, Xingyi Zhou, Alireza Fathi et al.
Visual Persona: Foundation Model for Full-Body Human Customization
Jisu Nam, Soowon Son, Zhan Xu et al.
Visual Prompting for One-shot Controllable Video Editing without Inversion
Zhengbo Zhang, Yuxi Zhou, Duo Peng et al.
Visual Representation Learning through Causal Intervention for Controllable Image Editing
Shanshan Huang, Haoxuan Li, Chunyuan Zheng et al.
VITED: Video Temporal Evidence Distillation
Yujie Lu, Yale Song, William Wang et al.
ViUniT: Visual Unit Tests for More Robust Visual Programming
Artemis Panagopoulou, Honglu Zhou, Silvio Savarese et al.
VL2Lite: Task-Specific Knowledge Distillation from Large Vision-Language Models to Lightweight Networks
Jinseong Jang, Chunfei Ma, Byeongwon Lee
VladVA: Discriminative Fine-tuning of LVLMs
Yassine Ouali, Adrian Bulat, Alexandros Xenos et al.