Computer Vision › Processing ›

Video Understanding

1592 directly classified papers

Papers per year

Papers

Semantic Fusion Augmentation and Semantic Boundary Detection: A Novel Approach to Multi-Target Video Moment Retrieval WACV 2024

T-VSL: Text-Guided Visual Sound Source Localization in Mixtures CVPR 2024

VideoGUI: A Benchmark for GUI Automation from Instructional Videos NIPS 2024

GMMFormer: Gaussian-Mixture-Model Based Transformer for Efficient Partially Relevant Video Retrieval AAAI 2024

Weakly-Supervised Representation Learning for Video Alignment and Analysis WACV 2024

DiffusionTrack: Diffusion Model for Multi-Object Tracking AAAI 2024

CycleCL: Self-Supervised Learning for Periodic Videos WACV 2024

Advancing Video Anomaly Detection: A Concise Review and a New Dataset NIPS 2024

Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors CVPR 2024

TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection AAAI 2024

Collaborative Weakly Supervised Video Correlation Learning for Procedure-Aware Instructional Video Analysis AAAI 2024

A multimodal analysis of different types of laughter expression in conversational dialogues INTERSPEECH 2024

Repetitive Action Counting With Motion Feature Learning WACV 2024

MaViLS, a Benchmark Dataset for Video-to-Slide Alignment, Assessing Baseline Accuracy with a Multimodal Alignment Algorithm Leveraging Speech, OCR, and Visual Features INTERSPEECH 2024

SAVSR: Arbitrary-Scale Video Super-Resolution via a Learned Scale-Adaptive Network AAAI 2024

Leveraging Next-Active Objects for Context-Aware Anticipation in Egocentric Videos WACV 2024

TrackIME: Enhanced Video Point Tracking via Instance Motion Estimation NIPS 2024

Diving Deep into the Motion Representation of Video-Text Models ACL 2024

CoSTA: End-to-End Comprehensive Space-Time Entanglement for Spatio-Temporal Video Grounding AAAI 2024

ODTrack: Online Dense Temporal Token Learning for Visual Tracking AAAI 2024

SpFormer: Spatio-Temporal Modeling for Scanpaths with Transformer AAAI 2024

Sequential Transformer for End-to-End Video Text Detection WACV 2024

Exploiting Auxiliary Caption for Video Grounding AAAI 2024

DeVos: Flow-Guided Deformable Transformer for Video Object Segmentation WACV 2024

CYCLO: Cyclic Graph Transformer Approach to Multi-Object Relationship Modeling in Aerial Videos NIPS 2024