Papers
VerIF: Verification Engineering for Reinforcement Learning in Instruction Following
Hao Peng, Yunjia Qi, Xiaozhi Wang et al.
VeriLocc: End-to-End Cross-Architecture Register Allocation via LLM
Lesheng Jin, Zhenyuan Ruan, Haohui Mai et al.
VERITAS: Leveraging Vision Priors and Expert Fusion to Improve Multimodal Data
Tingqiao Xu, Ziru Zeng, Jiayu Chen
Versatile Framework for Song Generation with Prompt-based Control
Yu Zhang, Wenxiang Guo, Changhao Pan et al.
VersaTune: An Efficient Data Composition Framework for Training Multi-Capability LLMs
Keer Lu, Keshi Zhao, Zhuoran Zhang et al.
VestaBench: An Embodied Benchmark for Safe Long-Horizon Planning Under Multi-Constraint and Adversarial Settings
Tanmana Sadhu, Yanan Chen, Ali Pesaranghader
Viability of Machine Translation for Healthcare in Low-Resourced Languages
Hellina Hailu Nigatu, Nikita Mehandru, Negasi Haile Abadi et al.
VIBE: Can a VLM Read the Room?
Tania Chakraborty, Eylon Caplan, Dan Goldwasser
ViClaim: A Multilingual Multilabel Dataset for Automatic Claim Detection in Videos
Patrick Giedemann, Pius von Däniken, Jan Milan Deriu et al.
Vicomtech@WMT 2025: Evolutionary Model Compression for Machine Translation
David Ponce, Harritxu Gete, Thierry Etchegoyhen
Video2Roleplay: A Multimodal Dataset and Framework for Video-Guided Role-playing Agents
Xueqiao Zhang, Chao Zhang, Jingtao Xu et al.
Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models
Xuyang Liu, Yiyu Wang, Junpeng Ma et al.
VideoEraser: Concept Erasure in Text-to-Video Diffusion Models
Naen Xu, Jinghuai Zhang, Changjiang Li et al.
VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interaction Format
Yueqian Wang, Xiaojun Meng, Yuxuan Wang et al.
VideoPASTA: 7K Preference Pairs That Matter for Video-LLM Alignment
Yogesh Kulkarni, Pooyan Fazli
Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning
Ziyang Wang, Jaehong Yoon, Shoubin Yu et al.
Video-Skill-CoT: Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning
Daeun Lee, Jaehong Yoon, Jaemin Cho et al.
ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents
Qiuchen Wang, Ruixue Ding, Zehui Chen et al.
ViDove: A Translation Agent System with Multimodal Context and Memory-Augmented Reasoning
Yichen Lu, Wei Dai, Jiaen Liu et al.
ViFT: Towards Visual Instruction-Free Fine-tuning for Large Vision-Language Models
Zikang Liu, Kun Zhou, Wayne Xin Zhao et al.
ViLBench: A Suite for Vision-Language Process Reward Modeling
Haoqin Tu, Weitao Feng, Hardy Chen et al.
ViPE: Visual Perception in Parameter Space for Efficient Video-Language Understanding
Shichen Lu, Tongtian Yue, Longteng Guo et al.
VISaGE: Understanding Visual Generics and Exceptions
Stella Frank, Emily Allaway
VisBias: Measuring Explicit and Implicit Social Biases in Vision Language Models
Jen-tse Huang, Jiantong Qin, Jianping Zhang et al.
VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation
Yuansheng Ni, Ping Nie, Kai Zou et al.