Computer Vision › Processing ›

Video Understanding

1592 directly classified papers

Papers per year

Papers

Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning CVPR 2023

The Wisdom of Crowds: Temporal Progressive Attention for Early Action Prediction CVPR 2023

Style-transfer based Speech and Audio-visual Scene understanding for Robot Action Sequence Acquisition from Videos INTERSPEECH 2023

Re2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization CVPR 2023

Efficient Movie Scene Detection Using State-Space Transformers CVPR 2023

ProTeGe: Untrimmed Pretraining for Video Temporal Grounding by Video Temporal Grounding CVPR 2023

NewsNet: A Novel Dataset for Hierarchical Temporal Segmentation CVPR 2023

LOGO: A Long-Form Video Dataset for Group Action Quality Assessment CVPR 2023

Behavioral Analysis of Vision-and-Language Navigation Agents CVPR 2023

Align and Attend: Multimodal Summarization With Dual Contrastive Losses CVPR 2023

RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4D ICCV 2023

Multimodal High-order Relation Transformer for Scene Boundary Detection ICCV 2023

Spatio-temporal Prompting Network for Robust Video Feature Extraction ICCV 2023

Scene-robust Natural Language Video Localization via Learning Domain-invariant Representations ACL 2023

Video Anomaly Detection via Sequentially Learning Multiple Pretext Tasks ICCV 2023

Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline CVPR 2023

What Can Simple Arithmetic Operations Do for Temporal Modeling? ICCV 2023

Video OWL-ViT: Temporally-consistent Open-world Localization in Video ICCV 2023

A-Cap: Anticipation Captioning With Commonsense Knowledge CVPR 2023

How You Feelin'? Learning Emotions and Mental States in Movie Scenes CVPR 2023

Lecture Presentations Multimodal Dataset: Towards Understanding Multimodality in Educational Videos ICCV 2023

Tubelet-Contrastive Self-Supervision for Video-Efficient Generalization ICCV 2023

Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition ICCV 2023

TCOVIS: Temporally Consistent Online Video Instance Segmentation ICCV 2023

A New Comprehensive Benchmark for Semi-Supervised Video Anomaly Detection and Anticipation CVPR 2023