Computer Vision › Processing ›

Video Understanding

1592 directly classified papers

Papers per year

Papers

Empowering Large Language Model for Continual Video Question Answering with Collaborative Prompting EMNLP 2024

VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning CVPR 2024

Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation CVPR 2024

Reconsidering Sentence-Level Sign Language Translation EMNLP 2024

Tri-Modal Motion Retrieval by Learning a Joint Embedding Space CVPR 2024

Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection CVPR 2024

MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding NIPS 2024

Context-Guided Spatio-Temporal Video Grounding CVPR 2024

VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models EMNLP 2024

CityPulse: Fine-Grained Assessment of Urban Change with Street View Time Series AAAI 2024

VVS: Video-to-Video Retrieval with Irrelevant Frame Suppression AAAI 2024

Motion-Aware Heatmap Regression for Human Pose Estimation in Videos IJCAI 2024

N-gram Unsupervised Compoundation and Feature Injection for Better Symbolic Music Understanding AAAI 2024

Local-Global Multi-Modal Distillation for Weakly-Supervised Temporal Video Grounding AAAI 2024

Context Enhanced Transformer for Single Image Object Detection in Video Data AAAI 2024

No More Shortcuts: Realizing the Potential of Temporal Self-Supervision AAAI 2024

Can't Make an Omelette Without Breaking Some Eggs: Plausible Action Anticipation Using Large Video-Language Models CVPR 2024

Fewer Steps, Better Performance: Efficient Cross-Modal Clip Trimming for Video Moment Retrieval Using Language AAAI 2024

FineSports: A Multi-person Hierarchical Sports Video Dataset for Fine-grained Action Understanding CVPR 2024

Contrastive Transformer Cross-Modal Hashing for Video-Text Retrieval IJCAI 2024

SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge CVPR 2024

Comprehensive Visual Grounding for Video Description AAAI 2024

Chronologically Accurate Retrieval for Temporal Grounding of Motion-Language Models ECCV 2024

Segment Any Change NIPS 2024

Movie Genre Classification by Language Augmentation and Shot Sampling WACV 2024