Co-occurring keywords
Papers
VQA-GNN: Reasoning with Multimodal Knowledge via Graph Neural Networks for Visual Question Answering
ICCV 2023
TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering
ICCV 2023
Compressing and Debiasing Vision-Language Pre-Trained Models for Visual Question Answering
EMNLP 2023
What’s “up” with vision-language models? Investigating their struggle with spatial reasoning
EMNLP 2023