visual question answering
1000 papers
Also known as
VQAI
OK-VQA
VQA
VIDEOQA
TEXTVQA
IMAGEQA
Co-occurring keywords
Papers
Exploring Response Uncertainty in MLLMs: An Empirical Evaluation under Misleading Scenarios
EMNLP 2025
Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding
CVPR 2025
Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models
COLING 2025
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
CVPR 2025
End-to-End Multi-Modal Diffusion Mamba
ICCV 2025
WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation
EMNLP 2025
GEMeX: A Large-Scale, Groundable, and Explainable Medical VQA Benchmark for Chest X-ray Diagnosis
ICCV 2025