Papers
VideoCuRL: Video Curriculum Reinforcement Learning with Orthogonal Difficulty Decomposition
Hongbo Jin, Kuanwei Lin, Wenhao Zhang et al.
Video-MMMU: Evaluating Knowledge Acquisition from Multidisciplinary Professional Videos
Kairui Hu, Penghao Wu, Fanyi Pu et al.
VideoPro: Adaptive Program Reasoning for Long Video Understanding
Chenglin Li, Feng Han, Yikun Wang et al.
VideoStir: Understanding Long Videos via Spatio-Temporally Structured and Intent-Aware RAG
Honghao Fu, Miao Xu, Yiwei Wang et al.
ViDoRe V3: A Comprehensive Evaluation of Retrieval Augmented Generation in Complex Real-World Scenarios
António Loison, Quentin Macé, Antoine Edy et al.
VIGIL: Defending LLM Agents Against Tool-Stream Injection via Verify-Before-Commit
Junda Lin, Zhaomeng Zhou, Zhi Zheng et al.
VIGNETTE: Socially Grounded Bias Evaluation for Vision-Language Models
Chahat Raj, Bowen Wei, Aylin Caliskan et al.
ViLL-E: Video LLM Embeddings for Retrieval
Rohit Gupta, Jayakrishnan Unnikrishnan, Fan Fei et al.
VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning
Jingkun Ma, Runzhe Zhan, Yang Li et al.
VishBox v2: A Multi-Agent System for Adaptive Voice Phishing Simulation
Sungmi Park, Daon Choi, Yoonmo Yang et al.
Vision-Language Introspection: Mitigating Overconfident Hallucinations in MLLMs via Interpretable Bi-Causal Steering
Shuliang Liu, Songbo Yang, Dong Fang et al.
VisPCO: Visual Token Pruning Configuration Optimization via Budget-Aware Pareto-Frontier Learning for Vision-Language Models
Huawei Ji, Yuanhao Sun, Yuan Jin et al.
VisRet: Visualization Improves Knowledge-Intensive Text-to-Image Retrieval
Di Wu, Yixin Wan, Kai-Wei Chang
Vista-LLM: Decoupled Query-Guided Visual Token Pruning for Efficient Long-Video Large Language Models
Zhenyu Li, Zuchao Li, Ping Wang et al.
VISTA: Verification In Sequential Turn-based Assessment
Ashley Lewis, Andrew Perrault, Eric Fosler-Lussier et al.
Visual and Memory–Augmented Soccer Commentary Generation
Haoran Sun, Natthawut Kertkeidkachorn, Kiyoaki Shirai
Visual Attention Reasoning via Hierarchical Search and Self-Verification
Wei Cai, Jian Zhao, Yuchen Yuan et al.
Visually-Guided Policy Optimization for Multimodal Reasoning
Zengbin Wang, Feng Xiong, Liang Lin et al.
Visual Self-Fulfilling Alignment: Shaping Safety-Oriented Personas via Threat-Related Images
Qishun Yang, Shu Yang, Lijie Hu et al.
VL-Calibration: Decoupled Confidence Calibration for Large Vision-Language Models Reasoning
Wenyi Xiao, Xinchi XU, Leilei Gan
VLN-MME: Diagnosing MLLMs as Language-guided Visual Navigation Agents
Xunyi Zhao, Gengze Zhou, Qi Wu
VLN-NF: Feasibility-Aware Vision-and-Language Navigation with False-Premise Instructions
Hung-Ting Su, Ting-Jun Wang, Jia-Fong Yeh et al.
Vocabulary Hijacking in LVLMs: Unveiling Critical Attention Heads by Excluding Inert Tokens to Mitigate Hallucination
Yangneng Chen, Junlin Li, Weijun Yao et al.
Vocabulary Shapes Cross-Lingual Variation of Word-Order Learnability in Language Models
Jonas Mayer Martins, Jaap Jumelet, Viola Priesemann et al.