Papers
V.I.P. : Iterative Online Preference Distillation for Efficient Video Diffusion Models
Jisoo Kim, Wooseok Seo, Junwan Kim et al.
VisHall3D: Monocular Semantic Scene Completion from Reconstructing the Visible Regions to Hallucinating the Invisible Regions
Haoang Lu, Yuanqi Su, Xiaoning Zhang et al.
Vision-Language Interactive Relation Mining for Open-Vocabulary Scene Graph Generation
Yukuan Min, Muli Yang, Jinhao Zhang et al.
Vision-Language Models Can't See the Obvious
Ngoc Dung Huynh, Phuc H Le-Khac, Wamiq Reyaz Para et al.
Vision-Language Neural Graph Featurization for Extracting Retinal Lesions
Taimur Hassan, Anabia Sohail, Muzammal Naseer et al.
VisionMath: Vision-Form Mathematical Problem-Solving
Zongyang Ma, Yuxin Chen, Ziqi Zhang et al.
VISION-XL: High Definition Video Inverse Problem Solver using Latent Image Diffusion Models
Taesung Kwon, Jong Chul Ye
VisNumBench: Evaluating Number Sense of Multimodal Large Language Models
Tengjin Weng, Jingyi Wang, Wenhao Jiang et al.
ViSpeak: Visual Instruction Feedback in Streaming Videos
Shenghao Fu, Qize Yang, Yuan-Ming Li et al.
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
Zhangquan Chen, Xufang Luo, Dongsheng Li
VistaDream: Sampling multiview consistent images for single-view scene reconstruction
Haiping Wang, Yuan Liu, Ziwei Liu et al.
Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images
Boyang Deng, Songyou Peng, Kyle Genova et al.
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning
Zhong-Yu Li, Ruoyi Du, Juncheng Yan et al.
Visual Intention Grounding for Egocentric Assistants
Pengzhan Sun, Junbin Xiao, Tze Ho Elden Tse et al.
Visual Interestingness Decoded: How GPT-4o Mirrors Human Interests
Fitim Abdullahu, Helmut Grabner
Visual Modality Prompt for Adapting Vision-Language Object Detectors
Heitor R. Medeiros, Atif Belal, Srikanth Muralidharan et al.
Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models
Zhen Zeng, Leijiang Gu, Xun Yang et al.
Visual Relation Diffusion for Human-Object Interaction Detection
Ping Cao, Yepeng Tang, Chunjie Zhang et al.
Visual-RFT: Visual Reinforcement Fine-Tuning
Ziyu Liu, Zeyi Sun, Yuhang Zang et al.
Visual Surface Wave Elastography: Revealing Subsurface Physical Properties via Visible Surface Waves
Alexander C. Ogren, Berthy T. Feng, Jihoon Ahn et al.
Visual Test-time Scaling for GUI Agent Grounding
Tiange Luo, Lajanugen Logeswaran, Justin Johnson et al.
Visual Textualization for Image Prompted Object Detection
Yongjian Wu, Yang Zhou, Jiya Saiyin et al.
VITAL: More Understandable Feature Visualization through Distribution Alignment and Relevant Information Flow
Ada Görgün, Bernt Schiele, Jonas Fischer
ViT-EnsembleAttack: Augmenting Ensemble Models for Stronger Adversarial Transferability in Vision Transformers
Hanwen Cao, Haobo Lu, Xiaosen Wang et al.