visual question answering

1000 papers

Explore in graph

Also known as

VQAI OK-VQA VQA VIDEOQA TEXTVQA IMAGEQA

Co-occurring keywords

multimodal learning (4622) vision-language model (2235) image captioning (728) vision language model (752) multi-modal learning (1276) multimodal large language model (865) large language model (12755) visual reasoning (479) attention mechanism (3975) benchmark evaluation (1539)

Papers

Gemini Goes to Med School: Exploring the Capabilities of Multimodal Large Language Models on Medical Challenge Problems & Hallucinations NAACL 2024

Overview of the MEDIQA-M3G 2024 Shared Task on Multilingual Multimodal Medical Answer Generation NAACL 2024

WangLab at MEDIQA-M3G 2024: Multimodal Medical Answer Generation using Large Language Models NAACL 2024

What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models CVPR 2024

ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs NIPS 2024

Can LLM’s Generate Human-Like Wayfinding Instructions? Towards Platform-Agnostic Embodied Instruction Synthesis NAACL 2024

The Illusion of Competence: Evaluating the Effect of Explanations on Users’ Mental Models of Visual Question Answering Systems EMNLP 2024

ERVQA: A Dataset to Benchmark the Readiness of Large Vision Language Models in Hospital Environments EMNLP 2024

Visual Text Matters: Improving Text-KVQA with Visual Text Entity Knowledge-aware Large Multimodal Assistant EMNLP 2024

VQAttack: Transferable Adversarial Attacks on Visual Question Answering via Pre-trained Models AAAI 2024

MLeVLM: Improve Multi-level Progressive Capabilities based on Multimodal Large Language Model for Medical Visual Question Answering ACL 2024

MemeMQA: Multimodal Question Answering for Memes via Rationale-Based Inferencing ACL 2024

CIC: A Framework for Culturally-Aware Image Captioning IJCAI 2024

EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models CVPR 2024

ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts CVPR 2024

FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts ACL 2024

i-Code Studio: A Configurable and Composable Framework for Integrative AI EMNLP 2024

Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models NIPS 2024

Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos CVPR 2024

Modality-Aware Integration with Large Language Models for Knowledge-Based Visual Question Answering ACL 2024

ChartInstruct: Instruction Tuning for Chart Comprehension and Reasoning ACL 2024

MedCoT: Medical Chain of Thought via Hierarchical Expert EMNLP 2024

TM-PATHVQA: 90000+ Textless Multilingual Questions for Medical Visual Question Answering INTERSPEECH 2024

Can I Trust Your Answer? Visually Grounded Video Question Answering CVPR 2024

OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM CVPR 2024