Papers
11,015 papers found
Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark
Tsung-Han Wu, Giscard Biamby, Jerome Quenum et al.
Visually Consistent Hierarchical Image Classification
Seulki Park, Youren Zhang, Stella X. Yu et al.
Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models
Donghoon Kim, Minji Bae, Kyuhong Shim et al.
Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning
Minheng Ni, YuTao Fan, Lei Zhang et al.
VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning
Yichao Liang, Nishanth Kumar, Hao Tang et al.
VLAS: Vision-Language-Action Model with Speech Instructions for Customized Robot Manipulation
Wei Zhao, Pengxiang Ding, Zhang Min et al.
VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration
Dezhan Tu, Danylo Vashchilenko, Yuzhe Lu et al.
VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning
Yongshuo Zong, Ondrej Bohdal, Timothy Hospedales
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
Ziyan Jiang, Rui Meng, Xinyi Yang et al.
VLMaterial: Procedural Material Generation with Large Vision-Language Models
Beichen Li, Rundi Wu, Armando Solar-Lezama et al.
VOILA: Evaluation of MLLMs For Perceptual Understanding and Analogical Reasoning
Nilay Yilmaz, Maitreya Patel, Yiran Lawrence Luo et al.
VoxDialogue: Can Spoken Dialogue Systems Understand Information Beyond Words?
Xize Cheng, Ruofan Hu, Xiaoda Yang et al.
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis
Yumeng Li, William H. Beluch, Margret Keuper et al.
VTDexManip: A Dataset and Benchmark for Visual-tactile Pretraining and Dexterous Manipulation with Reinforcement Learning
Qingtao Liu, Yu Cui, Zhengnan Sun et al.
VVC-Gym: A Fixed-Wing UAV Reinforcement Learning Environment for Multi-Goal Long-Horizon Problems
Xudong Gong, Feng Dawei, Kele Xu et al.
Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations
Katie Matton, Robert Ness, John Guttag et al.
Ward: Provable RAG Dataset Inference via LLM Watermarks
Nikola Jovanović, Robin Staab, Maximilian Baader et al.
WardropNet: Traffic Flow Predictions via Equilibrium-Augmented Learning
Kai Jungel, Dario Paccagnan, Axel Parmentier et al.
Warm Diffusion: Recipe for Blur-Noise Mixture Diffusion Models
Hao-Chien Hsueh, Wen-Hsiao Peng, Ching-Chun Huang
Wasserstein Distances, Neuronal Entanglement, and Sparsity
Shashata Sawmya, Linghao Kong, Ilia Markov et al.
Wasserstein-Regularized Conformal Prediction under General Distribution Shift
Rui Xu, Chao Chen, Yue Sun et al.
Watch Less, Do More: Implicit Skill Discovery for Video-Conditioned Policy
Jiangxing Wang, Zongqing Lu
Watermark Anything With Localized Messages
Tom Sander, Pierre Fernandez, Alain Oliviero Durmus et al.
Wavelet-based Positional Representation for Long Context
Yui Oka, Taku Hasegawa, Kyosuke Nishida et al.
Wavelet Diffusion Neural Operator
Peiyan Hu, Rui Wang, Xiang Zheng et al.