image captioning

728 papers

Explore in graph

Also known as

IDC PIC IAC IC

Co-occurring keywords

multimodal learning (4622) visual question answering (1000) vision-language model (2235) text generation (2903) attention mechanism (3975) visual grounding (505) zero-shot learning (3637) multi-modal learning (1276) vision language model (752) natural language generation (782)

Papers

Landsat30-AU: A Vision-Language Dataset for Australian Landsat Imagery AAAI 2026

ChartQA-X: Generating Explanations for Visual Chart Reasoning WACV 2026

AfriCaption: Establishing a New Paradigm for Image Captioning in African Languages EACL 2026

Knowledge-Enhanced Image Captioning with Adaptive Graph-based Multimodal Alignment and LLM AAAI 2026

Explaining the Unseen: Multimodal Vision-Language Reasoning for Situational Awareness in Underground Mining Disasters WACV 2026

LASOR: Towards Clinically Transparent and Explainable Ophthalmic Report Generation via Lesion-Aware Segmentation WACV 2026

A Woman with a Knife or A Knife with a Woman? Measuring Directional Bias Amplification in Image Captions WACV 2026

AGIC: Attention-Guided Image Captioning to Improve Caption Relevance EACL 2026

Knowledge Completes the Vision: A Multimodal Entity-aware Retrieval-Augmented Generation Framework for News Image Captioning AAAI 2026

TextGround4M: A Prompt-Aligned Dataset for Layout-Aware Text Rendering AAAI 2026

Top-Down Semantic Refinement for Image Captioning AAAI 2026

JEEM: Vision-Language Understanding in Four Arabic Dialects EACL 2026

Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives IJCAI 2025

Enhancing Large Language Models for Scientific Multimodal Summarization with Multimodal Output COLING 2025

MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects CVPR 2025

Caption Generation in Cultural Heritage: Crowdsourced Data and Tuning Multimodal Large Language Models NAACL 2025

EXPERT: An Explainable Image Captioning Evaluation Metric with Structured Explanations ACL 2025

Variance-Based Membership Inference Attacks Against Large-Scale Image Captioning Models CVPR 2025

Zero-Shot Image Captioning with Multi-type Entity Representations AAAI 2025

JNLP at SemEval-2025 Task 1: Multimodal Idiomaticity Representation with Large Language Models ACL 2025

ImageEval 2025: The First Arabic Image Captioning Shared Task EMNLP 2025

Cross-modal Clustering-based Retrieval for Scalable and Robust Image Captioning ACL 2025

ReVision: A Dataset and Baseline VLM for Privacy-Preserving Task-Oriented Visual Instruction Rewriting IJCNLP 2025

ViPCap: Retrieval Text-Based Visual Prompts for Lightweight Image Captioning AAAI 2025

Captions Speak Louder than Images: Generalizing Foundation Models for E-commerce from High-quality Multimodal Instruction Data IJCNLP 2025