Papers
VertexRegen: Mesh Generation with Continuous Level of Detail
Xiang Zhang, Yawar Siddiqui, Armen Avetisyan et al.
VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization
Sihan Yang, Runsen Xu, Chenhang Cui et al.
VGGSounder: Audio-Visual Evaluations for Foundation Models
Daniil Zverev, Thaddäus Wiedemer, Ameya Prabhu et al.
VGMamba: Attribute-to-Location Clue Reasoning for Quantity-Agnostic 3D Visual Grounding
Yihang Zhu, Jinhao Zhang, Yuxuan Wang et al.
ViCTr: Vital Consistency Transfer for Pathology Aware Image Synthesis
Onkar Susladkar, Gayatri Deshmukh, Yalcin Tur et al.
Video2BEV: Transforming Drone Videos to BEVs for Video-based Geo-localization
Hao Ju, Shaofei Huang, Si Liu et al.
VideoAds for Fast-Paced Video Understanding
Zheyuan Zhang, Wanying Dou, Linkai Peng et al.
VideoAuteur: Towards Long Narrative Video Generation
Junfei Xiao, Feng Cheng, Lu Qi et al.
Video Color Grading via Look-Up Table Generation
Seunghyun Shin, Dongmin Shin, Jisu Shin et al.
Video Individual Counting for Moving Drones
Yaowu Fan, Jia Wan, Tao Han et al.
VideoLLaMB: Long Streaming Video Understanding with Recurrent Memory Bridges
Yuxuan Wang, Yiqi Song, Cihang Xie et al.
VideoMiner: Iteratively Grounding Key Frames of Hour-Long Videos via Tree-based Group Relative Policy Optimization
Xinye Cao, Hongcan Guo, Jiawen Qian et al.
Video Motion Graphs
Haiyang Liu, Zhan Xu, Fa-Ting Hong et al.
VideoOrion: Tokenizing Object Dynamics in Videos
Yicheng Feng, Yijiang Li, Wanpeng Zhang et al.
VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Joint Modeling
Hyojun Go, Byeongjun Park, Hyelin Nam et al.
VideoSetDiff: Identifying and Reasoning Similarities and Differences in Similar Videos
Yue Qiu, Yanjun Sun, Takuma Yagi et al.
Video-T1: Test-time Scaling for Video Generation
Fangfu Liu, Hanyang Wang, Yimo Cai et al.
VideoVAE+: Large Motion Video Autoencoding with Cross-modal Video VAE
Yazhou Xing, Yang Fei, Yingqing He et al.
Vid-Group: Temporal Video Grounding Pretraining from Unlabeled Videos in the Wild
Peijun Bao, Chenqi Kong, Siyuan Yang et al.
ViewSRD: 3D Visual Grounding via Structured Multi-View Decomposition
Ronggang Huang, Haoxin Yang, Yan Cai et al.
VIGFace: Virtual Identity Generation for Privacy-Free Face Recognition Dataset
Minsoo Kim, Min-Cheol Sagong, Gi Pyo Nam et al.
ViLLa: Video Reasoning Segmentation with Large Language Model
Rongkun Zheng, Lu Qi, Xi Chen et al.
ViLU: Learning Vision-Language Uncertainties for Failure Prediction
Marc Lafon, Yannis Karmim, Julio Silva-Rodríguez et al.
ViM-VQ: Efficient Post-Training Vector Quantization for Visual Mamba
Juncan Deng, Shuaiting Li, Zeyu Wang et al.
VIPerson: Flexibly Generating Virtual Identity for Person Re-Identification
Xiao-Wen Zhang, Delong Zhang, Yi-Xing Peng et al.