Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Generation
Computer Vision
›
Generation
›
Image Captioning
781 directly classified papers
Papers per year
2003: 1
2008: 1
2011: 1
2012: 1
2013: 5
2014: 2
2015: 21
2016: 17
2017: 36
2018: 47
2019: 92
2020: 73
2021: 96
2022: 91
2023: 107
2024: 86
2025: 96
2026: 8
Papers
Show, Deconfound and Tell: Image Captioning With Causal Inference
CVPR 2022
CLIP4IDC: CLIP for Image Difference Captioning
IJCNLP 2022
Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning
CVPR 2022
Text-Only Training for Image Captioning using Noise-Injected CLIP
EMNLP 2022
ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding
EMNLP 2022
Concadia: Towards Image-Based Text Generation with a Purpose
EMNLP 2022
Scaling Up Vision-Language Pre-Training for Image Captioning
CVPR 2022
Comprehending and Ordering Semantics for Image Captioning
CVPR 2022
Grounding Answers for Visual Questions Asked by Visually Impaired People
CVPR 2022
DIFNet: Boosting Visual Information Flow for Image Captioning
CVPR 2022
Less Is More: Generating Grounded Navigation Instructions From Landmarks
CVPR 2022
Focus! Relevant and Sufficient Context Selection for News Image Captioning
EMNLP 2022
Hierarchical Modular Network for Video Captioning
CVPR 2022
Prediction of People’s Emotional Response towards Multi-modal News
IJCNLP 2022
NOC-REK: Novel Object Captioning With Retrieved Vocabulary From External Knowledge
CVPR 2022
MAGIC: Multimodal relAtional Graph adversarIal inferenCe for Diverse and Unpaired Text-Based Image Captioning
AAAI 2022
FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified Retrieval and Captioning
EMNLP 2022
Context Matters for Image Descriptions for Accessibility: Challenges for Referenceless Evaluation Metrics
EMNLP 2022
CapOnImage: Context-driven Dense-Captioning on Image
EMNLP 2022
Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset
EMNLP 2022
L-Verse: Bidirectional Generation Between Image and Text
CVPR 2022
3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds
CVPR 2022
End-to-End Generative Pretraining for Multimodal Video Captioning
CVPR 2022
How do people talk about images? A study on open-domain conversations with images.
NAACL 2022
Combine to Describe: Evaluating Compositional Generalization in Image Captioning
ACL 2022
<
1
…
12
13
14
…
32
>