Papers
Video-Enhanced Offline Reinforcement Learning: A Model-Based Approach
Minting Pan, Yitao Zheng, Jiajian Li et al.
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models
Hila Chefer, Uriel Singer, Amit Zohar et al.
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations
Yucheng Hu, Yanjiang Guo, Pengchao Wang et al.
VideoRoPE: What Makes for Good Video Rotary Position Embedding?
Xilin Wei, Xiaoran Liu, Yuhang Zang et al.
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
Guangzhi Sun, Yudong Yang, Jimin Zhuang et al.
VinePPO: Refining Credit Assignment in RL Training of LLMs
Amirhossein Kazemnejad, Milad Aghajohari, Eva Portelance et al.
Vintix: Action Model via In-Context Reinforcement Learning
Andrei Polubarov, Lyubaykin Nikita, Alexander Derevyagin et al.
VIP: Vision Instructed Pre-training for Robotic Manipulation
Zhuoling Li, Liangliang Ren, Jinrong Yang et al.
Vision Graph Prompting via Semantic Low-Rank Decomposition
Zixiang Ai, Zichen Liu, Jiahuan Zhou
Vision-Language Models Create Cross-Modal Task Representations
Grace Luo, Trevor Darrell, Amir Bar
Vision-Language Model Selection and Reuse for Downstream Adaptation
Hao-Zhe Tan, Zhi Zhou, Yu-Feng Li et al.
VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters
Mouxiang Chen, Lefei Shen, Zhuo Li et al.
Visual Abstraction: A Plug-and-Play Approach for Text-Visual Retrieval
Guofeng Ding, Yiding Lu, Peng Hu et al.
Visual and Domain Knowledge for Professional-level Graph-of-Thought Medical Reasoning
Rina Bao, Shilong Dong, Zhenfang Chen et al.
Visual Attention Never Fades: Selective Progressive Attention ReCalibration for Detailed Image Captioning in Multimodal Large Language Models
Mingi Jung, Saehyung Lee, Eunji Kim et al.
Visual Autoregressive Modeling for Image Super-Resolution
Yunpeng Qu, Kun Yuan, Jinhua Hao et al.
Visual Generation Without Guidance
Huayu Chen, Kai Jiang, Kaiwen Zheng et al.
Visual Graph Arena: Evaluating Visual Conceptualization of Vision and Multimodal Large Language Models
Zahra Babaiee, Peyman Kiasari, Daniela Rus et al.
ViTally Consistent: Scaling Biological Representation Learning for Cell Microscopy
Kian Kenyon-Dean, Zitong Jerry Wang, John Urbanik et al.
Volume-Aware Distance for Robust Similarity Learning
Shuo Chen, Chen Gong, Jun Li et al.
Volume Optimality in Conformal Prediction with Structured Prediction Sets
Chao Gao, Liren Shan, Vaidehi Srinivas et al.
Voronoi-grid-based Pareto Front Learning and Its Application to Collaborative Federated Learning
Mengmeng Chen, Xiaohu Wu, Qiqi Liu et al.
VTGaussian-SLAM: RGBD SLAM for Large Scale Scenes with Splatting View-Tied 3D Gaussians
Pengchong Hu, Zhizhong Han
Vulnerability-Aware Alignment: Mitigating Uneven Forgetting in Harmful Fine-Tuning
Liang Chen, Xueting Han, Li Shen et al.
Wait-Less Offline Tuning and Re-solving for Online Decision Making
Jingruo Sun, Wenzhi Gao, Ellen Vitercik et al.