Computer Vision › Analysis ›

Video Understanding

1098 directly classified papers

Papers per year

Papers

Learning to Segment Referred Objects from Narrated Egocentric Videos CVPR 2024

LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation CVPR 2024

SyncVIS: Synchronized Video Instance Segmentation NIPS 2024

Adapting Short-Term Transformers for Action Detection in Untrimmed Videos CVPR 2024

TD²-Net: Toward Denoising and Debiasing for Video Scene Graph Generation AAAI 2024

Leveraging Frame Affinity for sRGB-to-RAW Video De-rendering CVPR 2024

OmniViD: A Generative Framework for Universal Video Understanding CVPR 2024

Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences ACL 2024

Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation CVPR 2024

Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models ACL 2024

VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation CVPR 2024

Encoding and Controlling Global Semantics for Long-form Video Question Answering EMNLP 2024

OnlineTAS: An Online Baseline for Temporal Action Segmentation NIPS 2024

Video-Text Prompting for Weakly Supervised Spatio-Temporal Video Grounding EMNLP 2024

VideoCon: Robust Video-Language Alignment via Contrast Captions CVPR 2024

Task-Driven Exploration: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection CVPR 2024

Uncovering What Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly CVPR 2024

Action Scene Graphs for Long-Form Understanding of Egocentric Videos CVPR 2024

Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos NIPS 2024

M2Beats: When Motion Meets Beats in Short-form Videos IJCAI 2024

QDFormer: Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition CVPR 2024

Prompt-Enhanced Multiple Instance Learning for Weakly Supervised Video Anomaly Detection CVPR 2024

Efficient and Effective Weakly-Supervised Action Segmentation via Action-Transition-Aware Boundary Alignment CVPR 2024

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives CVPR 2024

SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos CVPR 2024