Computer Vision › Processing ›

Video Understanding

1592 directly classified papers

Papers per year

Papers

T-VSL: Text-Guided Visual Sound Source Localization in Mixtures CVPR 2024

Comprehensive Visual Grounding for Video Description AAAI 2024

ViLCo-Bench: VIdeo Language COntinual learning Benchmark NIPS 2024

Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors CVPR 2024

CSTA: CNN-based Spatiotemporal Attention for Video Summarization CVPR 2024

SnAG: Scalable and Accurate Video Grounding CVPR 2024

Adapting Short-Term Transformers for Action Detection in Untrimmed Videos CVPR 2024

Abductive Ego-View Accident Video Understanding for Safe Driving Perception CVPR 2024

Infer from What You Have Seen Before: Temporally-dependent Classifier for Semi-supervised Video Segmentation CVPR 2024

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives CVPR 2024

Action Scene Graphs for Long-Form Understanding of Egocentric Videos CVPR 2024

A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives CVPR 2024

Align Before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action Recognition CVPR 2024

TAPVid-3D: A Benchmark for Tracking Any Point in 3D NIPS 2024

FACT: Frame-Action Cross-Attention Temporal Modeling for Efficient Action Segmentation CVPR 2024

VVS: Video-to-Video Retrieval with Irrelevant Frame Suppression AAAI 2024

Neighbor Relations Matter in Video Scene Detection CVPR 2024

Putting the Object Back into Video Object Segmentation CVPR 2024

Retrieval-Augmented Egocentric Video Captioning CVPR 2024

Active Speaker Detection in Fisheye Meeting Scenes with Scene Spatial Spectrums INTERSPEECH 2024

Learned Scanpaths Aid Blind Panoramic Video Quality Assessment CVPR 2024

Test-Time Zero-Shot Temporal Action Localization CVPR 2024

Align and Aggregate: Compositional Reasoning with Video Alignment and Answer Aggregation for Video Question-Answering CVPR 2024

Modular Blind Video Quality Assessment CVPR 2024

DTF-AT: Decoupled Time-Frequency Audio Transformer for Event Classification AAAI 2024