conftrace_

visual question answering

1000 papers

Explore in graph

Also known as

VQA

Co-occurring keywords

multimodal learning (4622) vision-language model (2235) image captioning (728) vision language model (752) multi-modal learning (1276) multimodal large language model (865) large language model (12755) visual reasoning (479) attention mechanism (3975) benchmark evaluation (1539)

Papers

ActiView: Evaluating Active Perception Ability for Multimodal Large Language Models ACL 2025

SpaRE: Enhancing Spatial Reasoning in Vision-Language Models with Synthetic Data ACL 2025

Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts ACL 2025

MMXU: A Multi-Modal and Multi-X-ray Understanding Dataset for Disease Progression ACL 2025

IntelliCockpitBench: A Comprehensive Benchmark to Evaluate VLMs for Intelligent Cockpit ACL 2025

Analyzing the Sensitivity of Vision Language Models in Visual Question Answering ACL 2025

MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale ACL 2025

T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts ACL 2025

CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation AAAI 2025

Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models AAAI 2025

V-Oracle: Making Progressive Reasoning in Deciphering Oracle Bones for You and Me ACL 2025

Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling ACL 2025

FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs CVPR 2025

TIV-Diffusion: Towards Object-Centric Movement for Text-driven Image to Video Generation AAAI 2025

MEXA: Towards General Multimodal Reasoning with Dynamic Multi-Expert Aggregation EMNLP 2025

McHirc: A Multimodal Benchmark for Chinese Idiom Reading Comprehension AAAI 2025

VQA-Augmented Machine Translation with Cross-Modal Contrastive Learning EMNLP 2025

NLKI: A Lightweight Natural Language Knowledge Integration Framework for Improving Small VLMs in Commonsense VQA Tasks EMNLP 2025

Attribution and Application of Multiple Neurons in Multimodal Large Language Models EMNLP 2025

Humor in Pixels: Benchmarking Large Multimodal Models Understanding of Online Comics EMNLP 2025

BiMediX2 : Bio-Medical EXpert LMM for Diverse Medical Modalities EMNLP 2025

BcQLM: Efficient Vision-Language Understanding with Distilled Q-Gated Cross-Modal Fusion EMNLP 2025

Can VLMs Recall Factual Associations From Visual References? EMNLP 2025

Debating for Better Reasoning in Vision-Language Models EMNLP 2025

Seeing More with Less: Human-like Representations in Vision Models CVPR 2025