Papers
U-ViLAR: Uncertainty-Aware Visual Localization for Autonomous Driving via Differentiable Association and Registration
Xiaofan Li, Zhihao Xu, Chenming Wu et al.
V2M4: 4D Mesh Animation Reconstruction from a Single Monocular Video
Jianqi Chen, Biao Zhang, Xiangjun Tang et al.
V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
Junqi Ge, Ziyi Chen, Jintao Lin et al.
V2XPnP: Vehicle-to-Everything Spatio-Temporal Fusion for Multi-Agent Perception and Prediction
Zewei Zhou, Hao Xiang, Zhaoliang Zheng et al.
V2XScenes: A Multiple Challenging Traffic Conditions Dataset for Large-Range Vehicle-Infrastructure Collaborative Perception
Bowen Wang, Yafei Wang, Wei Gong et al.
VACE: All-in-One Video Creation and Editing
Zeyinzi Jiang, Zhen Han, Chaojie Mao et al.
VAFlow: Video-to-Audio Generation with Cross-Modality Flow Matching
Xihua Wang, Xin Cheng, Yuyue Wang et al.
VAGUE: Visual Contexts Clarify Ambiguous Expressions
Heejeong Nam, Jinwoo Ahn, Keummin Ka et al.
VALLR: Visual ASR Language Model for Lip Reading
Marshall Thomas, Edward Fish, Richard Bowden
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers
Weiming Ren, Wentao Ma, Huan Yang et al.
VA-MoE: Variables-Adaptive Mixture of Experts for Incremental Weather Forecasting
Hao Chen, Han Tao, Guo Song et al.
Variance-Based Pruning for Accelerating and Compressing Trained Networks
Uranik Berisha, Jens Mehnert, Alexandru Paul Condurache
VCA: Video Curious Agent for Long Video Understanding
Zeyuan Yang, Delin Chen, Xueyang Yu et al.
Vector Contrastive Learning For Pixel-Wise Pretraining In Medical Vision
Yuting He, Shuo Li
VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation
Shoubin Yu, Difan Liu, Ziqiao Ma et al.
VehicleMAE: View-asymmetry Mutual Learning for Vehicle Re-identification Pre-training via Masked AutoEncoders
Qi Wang, Zeyu Zhang, Dong Wang et al.
Verbalized Representation Learning for Interpretable Few-Shot Generalization
Cheng-Fu Yang, Da Yin, Wenbo Hu et al.
Versatile Transition Generation with Image-to-Video Diffusion
Zuhao Yang, Jiahui Zhang, Yingchen Yu et al.
VertexRegen: Mesh Generation with Continuous Level of Detail
Xiang Zhang, Yawar Siddiqui, Armen Avetisyan et al.
VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization
Sihan Yang, Runsen Xu, Chenhang Cui et al.
VGGSounder: Audio-Visual Evaluations for Foundation Models
Daniil Zverev, Thaddäus Wiedemer, Ameya Prabhu et al.
VGMamba: Attribute-to-Location Clue Reasoning for Quantity-Agnostic 3D Visual Grounding
Yihang Zhu, Jinhao Zhang, Yuxuan Wang et al.
ViCTr: Vital Consistency Transfer for Pathology Aware Image Synthesis
Onkar Susladkar, Gayatri Deshmukh, Yalcin Tur et al.
Video2BEV: Transforming Drone Videos to BEVs for Video-based Geo-localization
Hao Ju, Shaofei Huang, Si Liu et al.
VideoAds for Fast-Paced Video Understanding
Zheyuan Zhang, Wanying Dou, Linkai Peng et al.