Papers
VIR-Bench: Evaluating Geospatial and Temporal Understanding of MLLMs via Travel Video Itinerary Reconstruction
Hao Wang, Eiki Murata, Lingfang Zhang et al.
VirtualEnv: A Platform for Embodied AI Research
Kabir Swain, Sijie Han, Ayush Raina et al.
Virtual Multiplex Staining for Histological Images Using a Marker-Wise Conditioned Diffusion Model
Hyun-Jic Oh, Junsik Kim, Zhiyi Shi et al.
VisAssist: A Visually Impaired-Captured Video Question Answering Benchmark for Assistive Systems
Qi Gao, Heng Li, Yixin Zhou et al.
Vision-G1: Towards General Reasoning Vision-Language Models via Reinforcement Learning
Yuheng Zha, Kun Zhou, Yujia Wu et al.
Vision-language Incremental Learning with Dual Class-individual Memory
Fuhai Chen, Feng Zhang, XiaoGuang Ma et al.
Vision-Language Models Guided Graph Concept Reasoning for Interpretable Diabetic Retinopathy Diagnosis
Qihao Xu, Xiaoling Luo, Yuxin Lin et al.
Vision-Language Reasoning for Geolocalization: A Reinforcement Learning Approach
Biao Wu, Meng Fang, Ling Chen et al.
Vision-MoR: Scaling Vision Transformer via Patch-Level Mixture-of-Recursions
Yunhong He, Zhengqing Yuan, Weixiang Sun et al.
Vision-Only Gaussian Splatting for Collaborative Semantic Occupancy Prediction
Cheng Chen, Hao Huang, Saurabh Bagchi
VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation
Jiazheng Xu, Yu Huang, Jiale Cheng et al.
Vision Transformers Are Circulant Attention Learners
Dongchen Han, Tianyu Li, Ziyi Wang et al.
Vista: Scene-Aware Optimization for Streaming Video Question Answering Under Post-Hoc Queries
Haocheng Lu, Nan Zhang, Wei Tao et al.
Visual Bridge: Universal Visual Perception Representations Generating
Yilin Gao, Shuguang Dou, Junzhou Li et al.
Visual-Friendly Concept Protection via Selective Adversarial Perturbations
Xiaoyue Mi, Fan Tang, You Wu et al.
VitalDiagnosis: AI-Driven Ecosystem for 24/7 Vital Monitoring and Chronic Disease Management
Zhikai Xue, Tianqianjin Lin, Pengwei Yan et al.
VITA: Variational Pretraining of Transformers for Climate-Robust Crop Yield Forecasting
Adib Hasan, Mardavij Roozbehani, Munther A. Dahleh
ViTCoP: Accelerating Large Vision-Language Models via Visual and Textual Semantic Collaborative Pruning
Wen Luo, Peng Chen, Xiaotao Huang et al.
ViTE: Virtual Graph Trajectory Expert Router for Pedestrian Trajectory Prediction
Ruochen Li, Zhanxing Zhu, Tanqiu Qiao et al.
ViType: High-Fidelity Visual Text Rendering via Glyph-Aware Multimodal Diffusion
Lishuai Gao, Jun-Yan He, Yingsen Zeng et al.
VividListener: Expressive and Controllable Listener Dynamics Modeling for Multi-Modal Responsive Interaction
Shiying Li, Xingqun Qi, Bingkun Yang et al.
VK-Det: Visual Knowledge Guided Prototype Learning for Open-Vocabulary Aerial Object Detection
Jianhang Yao, Yongbin Zheng, Siqi Lu et al.
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model
Yihao Wang, Pengxiang Ding, Lingxiao Li et al.
VMChill: A Dataset for Fine-Grained Visual-Musical Synergy
Xiaowei Chi, Zeyue Tian, Jialiang Chen et al.