Computer Vision › Analysis ›

Video Understanding

1098 directly classified papers

Papers per year

Papers

Automated Sperm Assessment Framework and Neural Network Specialized for Sperm Video Recognition WACV 2024

CoSTA: End-to-End Comprehensive Space-Time Entanglement for Spatio-Temporal Video Grounding AAAI 2024

HENASY: Learning to Assemble Scene-Entities for Interpretable Egocentric Video-Language Model NIPS 2024

IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos NIPS 2024

Transferable Video Moment Localization by Moment-Guided Query Prompting AAAI 2024

Animal-Bench: Benchmarking Multimodal Video Models for Animal-centric Video Understanding NIPS 2024

TrackIME: Enhanced Video Point Tracking via Instance Motion Estimation NIPS 2024

VVS: Video-to-Video Retrieval with Irrelevant Frame Suppression AAAI 2024

Exploring Temporal Feature Correlation for Efficient and Stable Video Semantic Segmentation AAAI 2024

Generalizable Implicit Motion Modeling for Video Frame Interpolation NIPS 2024

Vript: A Video Is Worth Thousands of Words NIPS 2024

Fewer Steps, Better Performance: Efficient Cross-Modal Clip Trimming for Video Moment Retrieval Using Language AAAI 2024

Multi-view Masked Contrastive Representation Learning for Endoscopic Video Analysis NIPS 2024

VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanding NIPS 2024

Learning Multi-Scale Video-Text Correspondence for Weakly Supervised Temporal Article Gronding AAAI 2024

Efficient Temporal Action Segmentation via Boundary-aware Query Voting NIPS 2024

Beyond Euclidean: Dual-Space Representation Learning for Weakly Supervised Video Violence Detection NIPS 2024

Exploring Domain Incremental Video Highlights Detection with the LiveFood Benchmark AAAI 2024

Video Token Merging for Long Video Understanding NIPS 2024

Slot-VLM: Object-Event Slots for Video-Language Modeling NIPS 2024

ViSTec: Video Modeling for Sports Technique Recognition and Tactical Analysis AAAI 2024

No More Shortcuts: Realizing the Potential of Temporal Self-Supervision AAAI 2024

Commonsense for Zero-Shot Natural Language Video Localization AAAI 2024

TD²-Net: Toward Denoising and Debiasing for Video Scene Graph Generation AAAI 2024

CausalChaos! Dataset for Comprehensive Causal Action Question Answering Over Longer Causal Chains Grounded in Dynamic Visual Scenes NIPS 2024