Papers
VerifyMatch: A Semi-Supervised Learning Paradigm for Natural Language Inference with Confidence-Aware MixUp
Seo Yeon Park, Cornelia Caragea
VeriScore: Evaluating the factuality of verifiable claims in long-form text generation
Yixiao Song, Yekyung Kim, Mohit Iyyer
VGA: Vision GUI Assistant - Minimizing Hallucinations through Image-Centric Fine-Tuning
Meng Ziyang, Yu Dai, Zezheng Gong et al.
VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation
Bocheng Zou, Mu Cai, Jianrui Zhang et al.
V-GlórIA - Customizing Large Vision and Language Models to European Portuguese
Afonso Simplício, David Semedo, Joao Magalhaes
VHASR: A Multimodal Speech Recognition System With Vision Hotwords
Jiliang Hu, Zuchao Li, Ping Wang et al.
Vicomtech@WMT 2024: Shared Task on Translation into Low-Resource Languages of Spain
David Ponce, Harritxu Gete, Thierry Etchegoyhen
VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models
Jiapeng Wang, Chengyu Wang, Kunzhe Huang et al.
Video Discourse Parsing and Its Application to Multimodal Summarization: A Dataset and Baseline Approaches
Tsutomu Hirao, Naoki Kobayashi, Hidetaka Kamigaito et al.
VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs
Ruotong Liao, Max Erler, Huiyu Wang et al.
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Bin Lin, Yang Ye, Bin Zhu et al.
VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation
Xuan He, Dongfu Jiang, Ge Zhang et al.
Video-Text Prompting for Weakly Supervised Spatio-Temporal Video Grounding
Heng Zhao, Zhao Yinjie, Bihan Wen et al.
VIEWS: Entity-Aware News Video Captioning
Hammad Ayyubi, Tianqi Liu, Arsha Nagrani et al.
VIMI: Grounding Video Generation through Multi-modal Instruction
Yuwei Fang, Willi Menapace, Aliaksandr Siarohin et al.
Virtual Context Enhancing Jailbreak Attacks with Special Token Injection
Yuqi Zhou, Lin Lu, Ryan Sun et al.
Virtual Personas for Language Models via an Anthology of Backstories
Suhong Moon, Marwa Abdulhai, Minwoo Kang et al.
Vision-Language Model Fine-Tuning via Simple Parameter-Efficient Modification
Ming Li, Jike Zhong, Chenxin Li et al.
Visual Editing with LLM-based Tool Chaining: An Efficient Distillation Approach for Real-Time Applications
Oren Sultan, Alexander Khasin, Guy Shiran et al.
Visualising Changes in Semantic Neighbourhoods of English Noun Compounds over Time
Malak Rassem, Myrto Tsigkouli, Chris W. Jenkins et al.
Visual Pivoting Unsupervised Multimodal Machine Translation in Low-Resource Distant Language Pairs
Turghun Tayir, Lin Li, Xiaohui Tao et al.
Visual Prompting in LLMs for Enhancing Emotion Recognition
Qixuan Zhang, Zhifeng Wang, Dylan Zhang et al.
Visual Question Decomposition on Multimodal Large Language Models
Haowei Zhang, Jianzhe Liu, Zhen Han et al.
Visual Text Matters: Improving Text-KVQA with Visual Text Entity Knowledge-aware Large Multimodal Assistant
Abhirama Subramanyam Penamakuri, Anand Mishra
VIVA: A Benchmark for Vision-Grounded Decision-Making with Human Values
Zhe Hu, Yixiao Ren, Jing Li et al.