visual question answering

1000 papers

Explore in graph

Also known as

VQAI OK-VQA VQA VIDEOQA TEXTVQA IMAGEQA

Co-occurring keywords

multimodal learning (4622) vision-language model (2235) image captioning (728) vision language model (752) multi-modal learning (1276) multimodal large language model (865) large language model (12755) visual reasoning (479) attention mechanism (3975) benchmark evaluation (1539)

Papers

JEEM: Vision-Language Understanding in Four Arabic Dialects EACL 2026

MosaicDoc: A Large-Scale Bilingual Benchmark for Visually Rich Document Understanding AAAI 2026

BanglaProtha: Evaluating Vision Language Models in Underrepresented Long-tail Cultural Contexts WACV 2026

Benchmarking and Mitigating the Impact of Noisy User Prompts in Medical VLMs via Cross-Modal Reflection EACL 2026

Compositional Reasoning via Joint Image and Language Decomposition EACL 2026

LLaVA³: Representing 3D Scenes Like a Cubist Painter to Boost 3D Scene Understanding of VLMs AAAI 2026

MapVerse: A Benchmark for Geospatial Question Answering on Diverse Real-World Maps WACV 2026

Ego-EXTRA: video-language Egocentric Dataset for EXpert-TRAinee assistance WACV 2026

Expanding the Boundaries of Vision Prior Knowledge in Multi-modal Large Language Models EACL 2026

Hospitality-VQA: Decision-Oriented Informativeness Evaluation for Vision–Language Models EACL 2026

DRIVINGVQA: A Dataset for Interleaved Visual Chain-of-Thought in Real-World Driving Scenarios EACL 2026

MangaVQA and MangaLMM: A Benchmark and Specialized Model for Multimodal Manga Understanding EACL 2026

MEENA (PersianMMMU): Multimodal-Multilingual Educational Exams for N-level Assessment EACL 2026

Visual–Linguistic Abductive Reasoning with LLMs for Knowledge-based Visual Question Answering EACL 2026

AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM WACV 2026

ChartQA-X: Generating Explanations for Visual Chart Reasoning WACV 2026

Relevance-aware Multi-context Contrastive Decoding for Retrieval-augmented Visual Question Answering WACV 2026

DermEVAL: A Dermatologist-Reviewed Benchmark for Multimodal Large Language Models WACV 2026

Direct Visual Grounding by Directing Attention of Visual Tokens WACV 2026

CLIP-UP: CLIP-Based Unanswerable Problem Detection for Visual Question Answering WACV 2026

CHROMIC: Chronological Reasoning Across Multi-Panel Comics EACL 2026

Do Images Speak Louder than Words? Investigating the Effect of Textual Misinformation in VLMs EACL 2026

VaseVQA: Multimodal Agent and Benchmark for Ancient Greek Pottery EACL 2026

Ask Me Again Differently: GRAS for Measuring Bias in Vision Language Models on Gender, Race, Age, and Skin Tone EACL 2026

ViType: High-Fidelity Visual Text Rendering via Glyph-Aware Multimodal Diffusion AAAI 2026