Computer Vision › Analysis ›

Video Understanding

1098 directly classified papers

Papers per year

Papers

Commonsense for Zero-Shot Natural Language Video Localization AAAI 2024

Transferable Video Moment Localization by Moment-Guided Query Prompting AAAI 2024

Open-Vocabulary Video Anomaly Detection CVPR 2024

CausalChaos! Dataset for Comprehensive Causal Action Question Answering Over Longer Causal Chains Grounded in Dynamic Visual Scenes NIPS 2024

Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models ACL 2024

BDIQA: A New Dataset for Video Question Answering to Explore Cognitive Reasoning through Theory of Mind AAAI 2024

Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences ACL 2024

Context Enhanced Transformer for Single Image Object Detection in Video Data AAAI 2024

Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos NIPS 2024

ViLCo-Bench: VIdeo Language COntinual learning Benchmark NIPS 2024

VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models EMNLP 2024

OnlineTAS: An Online Baseline for Temporal Action Segmentation NIPS 2024

Continuous Product Graph Neural Networks NIPS 2024

Local-Global Multi-Modal Distillation for Weakly-Supervised Temporal Video Grounding AAAI 2024

Encoding and Controlling Global Semantics for Long-form Video Question Answering EMNLP 2024

MatchTime: Towards Automatic Soccer Game Commentary Generation EMNLP 2024

Exploring Union and Intersection of Visual Regions for Generating Questions, Answers, and Distractors EMNLP 2024

Towards a Complete Benchmark on Video Moment Localization AISTATS 2024

Collaborative Weakly Supervised Video Correlation Learning for Procedure-Aware Instructional Video Analysis AAAI 2024

SyncVIS: Synchronized Video Instance Segmentation NIPS 2024

LogoStyleFool: Vitiating Video Recognition Systems via Logo Style Transfer AAAI 2024

VVS: Video-to-Video Retrieval with Irrelevant Frame Suppression AAAI 2024

Semi-supervised Active Learning for Video Action Detection AAAI 2024

CYCLO: Cyclic Graph Transformer Approach to Multi-Object Relationship Modeling in Aerial Videos NIPS 2024

ActFusion: a Unified Diffusion Model for Action Segmentation and Anticipation NIPS 2024