Computer Vision › Processing ›

Video Understanding

1592 directly classified papers

Papers per year

Papers

Rethinking the Video Sampling and Reasoning Strategies for Temporal Sentence Grounding EMNLP 2022

Multimodal Conversation Modelling for Topic Derailment Detection EMNLP 2022

Clean Text and Full-Body Transformer: Microsoft’s Submission to the WMT22 Shared Task on Sign Language Translation EMNLP 2022

Revisiting the "Video" in Video-Language Understanding CVPR 2022

UBoCo: Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection CVPR 2022

AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant EMNLP 2022

Extending Phrase Grounding with Pronouns in Visual Dialogues EMNLP 2022

Modal-specific Pseudo Query Generation for Video Corpus Moment Retrieval EMNLP 2022

MovieUN: A Dataset for Movie Understanding and Narrating EMNLP 2022

Context-Aware Biaffine Localizing Network for Temporal Sentence Grounding CVPR 2021

Explainable Video Entailment With Grounded Visual Evidence ICCV 2021

S3-Net: A Fast and Lightweight Video Scene Understanding Network by Single-Shot Segmentation WACV 2021

Supervoxel Attention Graphs for Long-Range Video Modeling WACV 2021

XVFI: eXtreme Video Frame Interpolation ICCV 2021

Unidentified Video Objects: A Benchmark for Dense, Open-World Segmentation ICCV 2021

Divide and Conquer for Single-Frame Temporal Action Localization ICCV 2021

Video Instance Segmentation With a Propose-Reduce Paradigm ICCV 2021

Learning Implicit Temporal Alignment for Few-shot Video Classification IJCAI 2021

Cross-Modal Learning for Audio-Visual Video Parsing INTERSPEECH 2021

Weakly-Supervised Video Anomaly Detection With Robust Temporal Feature Magnitude Learning ICCV 2021

Relaxed Transformer Decoders for Direct Action Proposal Generation ICCV 2021

Pyramid Spatial-Temporal Aggregation for Video-Based Person Re-Identification ICCV 2021

VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild CVPR 2021

Affect2MM: Affective Analysis of Multimedia Content Using Emotion Causality CVPR 2021

AVLnet: Learning Audio-Visual Language Representations from Instructional Videos INTERSPEECH 2021