Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Keywords
image captioning
728 papers
Explore in graph
Also known as
IDC
PIC
IAC
IC
Co-occurring keywords
multimodal learning
(4622)
visual question answering
(1000)
vision-language model
(2235)
text generation
(2903)
attention mechanism
(3975)
visual grounding
(505)
zero-shot learning
(3637)
multi-modal learning
(1276)
vision language model
(752)
natural language generation
(782)
Papers
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
CVPR 2024
Improving Cross-Modal Alignment with Synthetic Pairs for Text-Only Image Captioning
AAAI 2024
BCAmirs at SemEval-2024 Task 4: Beyond Words: A Multimodal and Multilingual Exploration of Persuasion in Memes
NAACL 2024
LAMBDA: Large Language Model-Based Data Augmentation for Multi-Modal Machine Translation
EMNLP 2024
ImageInWords: Unlocking Hyper-Detailed Image Descriptions
EMNLP 2024
PAELLA: Parameter-Efficient Lightweight Language-Agnostic Captioning Model
NAACL 2024
LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?
NAACL 2024
Context-aware Difference Distilling for Multi-change Captioning
ACL 2024
Zero-Shot Building Attribute Extraction From Large-Scale Vision and Language Models
WACV 2024
Describing Images Fast and Slow: Quantifying and Predicting the Variation in Human Signals during Visuo-Linguistic Processes
EACL 2024
Visually-Aware Context Modeling for News Image Captioning
NAACL 2024
DCU ADAPT at WMT24: English to Low-resource Multi-Modal Translation Task
EMNLP 2024
ALOHa: A New Measure for Hallucination in Captioning Models
NAACL 2024
Direct Metric Optimization for Image Captioning through Reward-Weighted Augmented Data Utilization
ACL 2024
Mitigating Open-Vocabulary Caption Hallucinations
EMNLP 2024
Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models?
EMNLP 2024
IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning
EMNLP 2024
Hyperparameter-Free Approach for Faster Minimum Bayes Risk Decoding
ACL 2024
Cycle-Consistency Learning for Captioning and Grounding
AAAI 2024
SciOL and MuLMS-Img: Introducing a Large-Scale Multimodal Scientific Dataset and Models for Image-Text Tasks in the Scientific Domain
WACV 2024
MeaCap: Memory-Augmented Zero-shot Image Captioning
CVPR 2024
Semantic Map-based Generation of Navigation Instructions
COLING 2024
VIXEN: Visual Text Comparison Network for Image Difference Captioning
AAAI 2024
Resampled Datasets Are Not Enough: Mitigating Societal Bias Beyond Single Attributes
EMNLP 2024
Text360Nav: 360-Degree Image Captioning Dataset for Urban Pedestrians Navigation
COLING 2024
<
1
…
6
7
8
…
30
>