video captioning

206 papers

Explore in graph

Also known as

MCN

Co-occurring keywords

video understanding (1647) multimodal learning (4622) image captioning (728) recurrent neural network (1790) video description (25) action recognition (957) attention mechanism (3975) natural language generation (782) vision-language model (2235) contrastive learning (3979)

Papers

Vript: A Video Is Worth Thousands of Words NIPS 2024

OW-VISCapTor: Abstractors for Open-World Video Instance Segmentation and Captioning NIPS 2024

MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning CVPR 2024

Previously on ... From Recaps to Story Summarization CVPR 2024

HourVideo: 1-Hour Video-Language Understanding NIPS 2024

CALVIN: Improved Contextual Video Captioning via Instruction Tuning NIPS 2024

UNICORN: A Unified Causal Video-Oriented Language-Modeling Framework for Temporal Video-Language Tasks EMNLP 2024

Retrieval-Augmented Egocentric Video Captioning CVPR 2024

VideoLLM-online: Online Video Large Language Model for Streaming Video CVPR 2024

Distilling Vision-Language Models on Millions of Videos CVPR 2024

MovieChat: From Dense Token to Sparse Memory for Long Video Understanding CVPR 2024

TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding CVPR 2024

DeVAn: Dense Video Annotation for Video-Language Models ACL 2024

VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanding NIPS 2024

Streaming Dense Video Captioning CVPR 2024

Comprehensive Visual Grounding for Video Description AAAI 2024

Stitching Segments and Sentences towards Generalization in Video-Text Pre-training AAAI 2024

Abstractive Multi-Video Captioning: Benchmark Dataset Construction and Extensive Evaluation COLING 2024

Unveiling the Invisible: Captioning Videos with Metaphors EMNLP 2024

AutoAD III: The Prequel - Back to the Pixels CVPR 2024

Set Prediction Guided by Semantic Concepts for Diverse Video Captioning AAAI 2024

Text With Knowledge Graph Augmented Transformer for Video Captioning CVPR 2023

Exploring Group Video Captioning with Efficient Relational Approximation ICCV 2023

Hierarchical Video-Moment Retrieval and Step-Captioning CVPR 2023

Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval ICCV 2023