Papers
VDebugger: Harnessing Execution Feedback for Debugging Visual Programs
Xueqing Wu, Zongyu Lin, Songyan Zhao et al.
V-DPO: Mitigating Hallucination in Large Vision Language Models via Vision-Guided Direct Preference Optimization
Yuxi Xie, Guanzhen Li, Xiao Xu et al.
Vector Poetics: Parallel Couplet Detection in Classical Chinese Poetry
Maciej Kurzynski, Xiaotong Xu, Yu Feng
VE-KD: Vocabulary-Expansion Knowledge-Distillation for Training Smaller Domain-Specific Language Models
Pengju Gao, Tomohiro Yamasaki, Kazunori Imoto
Verba volant, scripta volant? Don’t worry! There are computational solutions for protoword reconstruction
Liviu P Dinu, Ana Sabina Uban, Alina Maria Cristea et al.
Verifiable, Debuggable, and Repairable Commonsense Logical Reasoning via LLM-based Theory Resolution
Armin Toroghi, Willis Guo, Ali Pesaranghader et al.
Verification and Refinement of Natural Language Explanations through LLM-Symbolic Theorem Proving
Xin Quan, Marco Valentino, Louise A. Dennis et al.
VerifyMatch: A Semi-Supervised Learning Paradigm for Natural Language Inference with Confidence-Aware MixUp
Seo Yeon Park, Cornelia Caragea
VeriScore: Evaluating the factuality of verifiable claims in long-form text generation
Yixiao Song, Yekyung Kim, Mohit Iyyer
VGA: Vision GUI Assistant - Minimizing Hallucinations through Image-Centric Fine-Tuning
Meng Ziyang, Yu Dai, Zezheng Gong et al.
VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation
Bocheng Zou, Mu Cai, Jianrui Zhang et al.
V-GlórIA - Customizing Large Vision and Language Models to European Portuguese
Afonso Simplício, David Semedo, Joao Magalhaes
VHASR: A Multimodal Speech Recognition System With Vision Hotwords
Jiliang Hu, Zuchao Li, Ping Wang et al.
Vicomtech@WMT 2024: Shared Task on Translation into Low-Resource Languages of Spain
David Ponce, Harritxu Gete, Thierry Etchegoyhen
VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models
Jiapeng Wang, Chengyu Wang, Kunzhe Huang et al.
Video Discourse Parsing and Its Application to Multimodal Summarization: A Dataset and Baseline Approaches
Tsutomu Hirao, Naoki Kobayashi, Hidetaka Kamigaito et al.
VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs
Ruotong Liao, Max Erler, Huiyu Wang et al.
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Bin Lin, Yang Ye, Bin Zhu et al.
VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
Xuan He, Dongfu Jiang, Ge Zhang et al.
Video-Text Prompting for Weakly Supervised Spatio-Temporal Video Grounding
Heng Zhao, Zhao Yinjie, Bihan Wen et al.
VIEWS: Entity-Aware News Video Captioning
Hammad Ayyubi, Tianqi Liu, Arsha Nagrani et al.
VIMI: Grounding Video Generation through Multi-modal Instruction
Yuwei Fang, Willi Menapace, Aliaksandr Siarohin et al.
Virtual Context Enhancing Jailbreak Attacks with Special Token Injection
Yuqi Zhou, Lin Lu, Ryan Sun et al.
Virtual Personas for Language Models via an Anthology of Backstories
Suhong Moon, Marwa Abdulhai, Minwoo Kang et al.
Vision-Language Model Fine-Tuning via Simple Parameter-Efficient Modification
Ming Li, Jike Zhong, Chenxin Li et al.