image captioning

728 papers

Explore in graph

Also known as

IDC PIC IAC IC

Co-occurring keywords

multimodal learning (4622) visual question answering (1000) vision-language model (2235) text generation (2903) attention mechanism (3975) visual grounding (505) zero-shot learning (3637) multi-modal learning (1276) vision language model (752) natural language generation (782)

Papers

UNISON: Unpaired Cross-Lingual Image Captioning AAAI 2022

Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning NIPS 2022

Multimodal Generation of Radiology Reports using Knowledge-Grounded Extraction of Entities and Relations AACL 2022

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation ICML 2022

Show, Deconfound and Tell: Image Captioning With Causal Inference CVPR 2022

Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training EMNLP 2022

Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners NIPS 2022

Bidimensional Leaderboards: Generate and Evaluate Language Hand in Hand NAACL 2022

Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset EMNLP 2022

CapOnImage: Context-driven Dense-Captioning on Image EMNLP 2022

mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections EMNLP 2022

Probing Cross-modal Semantics Alignment Capability from the Textual Perspective EMNLP 2022

What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding without Text Inputs NIPS 2022

Focus! Relevant and Sufficient Context Selection for News Image Captioning EMNLP 2022

Learning Distinct and Representative Modes for Image Captioning NIPS 2022

Concadia: Towards Image-Based Text Generation with a Purpose EMNLP 2022

Text-Only Training for Image Captioning using Noise-Injected CLIP EMNLP 2022

Inference of captions from histopathological patches MIDL 2022

Flamingo: a Visual Language Model for Few-Shot Learning NIPS 2022

Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning CVPR 2022

OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework ICML 2022

Weakly-Supervised Generation and Grounding of Visual Descriptions With Conditional Generative Models CVPR 2022

Scaling Up Vision-Language Pre-Training for Image Captioning CVPR 2022

Comprehending and Ordering Semantics for Image Captioning CVPR 2022

DeeCap: Dynamic Early Exiting for Efficient Image Captioning CVPR 2022