Co-occurring keywords
Papers
Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training
EMNLP 2022
What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding without Text Inputs
NIPS 2022
Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning
CVPR 2022