Computer Vision › Processing ›

Video Understanding

1592 directly classified papers

Papers per year

Papers

CTVIS: Consistent Training for Online Video Instance Segmentation ICCV 2023

Action Sensitivity Learning for Temporal Action Localization ICCV 2023

Motion-Guided Masking for Spatiotemporal Representation Learning ICCV 2023

DVIS: Decoupled Video Instance Segmentation Framework ICCV 2023

PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking ICCV 2023

Video Summarization Leveraging Multimodal Information for Presentations INTERSPEECH 2023

Multi-Scale Attention for Audio Question Answering INTERSPEECH 2023

MOSE: A New Dataset for Video Object Segmentation in Complex Scenes ICCV 2023

LVOS: A Benchmark for Long-term Video Object Segmentation ICCV 2023

Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert CVPR 2023

MMVP: Motion-Matrix-Based Video Prediction ICCV 2023

Motion Question Answering via Modular Motion Programs ICML 2023

Spatial-Then-Temporal Self-Supervised Learning for Video Correspondence CVPR 2023

Boosting Positive Segments for Weakly-Supervised Audio-Visual Video Parsing ICCV 2023

Collaborative Static and Dynamic Vision-Language Streams for Spatio-Temporal Video Grounding CVPR 2023

Target-Aware Spatio-Temporal Reasoning via Answering Questions in Dynamic Audio-Visual Scenarios EMNLP 2023

Tracking Everything Everywhere All at Once ICCV 2023

Discovering the Real Association: Multimodal Causal Reasoning in Video Question Answering CVPR 2023

Annotations Are Not All You Need: A Cross-modal Knowledge Transfer Network for Unsupervised Temporal Sentence Grounding EMNLP 2023

Mulan: A Multi-Level Alignment Model for Video Question Answering EMNLP 2023

VindLU: A Recipe for Effective Video-and-Language Pretraining CVPR 2023

Audio-Visual Deception Detection: DOLOS Dataset and Parameter-Efficient Crossmodal Learning ICCV 2023

DCVNet: Dilated Cost Volume Networks for Fast Optical Flow WACV 2023

Lip2Vec: Efficient and Robust Visual Speech Recognition via Latent-to-Latent Visual to Audio Representation Mapping ICCV 2023

Moment Detection in Long Tutorial Videos ICCV 2023