Computer Vision › Processing ›

Video Understanding

1592 directly classified papers

Papers per year

Papers

OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation ICCV 2023

VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic Transitions ACL 2023

Movement Enhancement toward Multi-Scale Video Feature Representation for Temporal Action Detection ICCV 2023

Diffusion Action Segmentation ICCV 2023

CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal Grounding ACL 2023

TAPIR: Tracking Any Point with Per-Frame Initialization and Temporal Refinement ICCV 2023

Leaping Into Memories: Space-Time Deep Feature Synthesis ICCV 2023

Scene-robust Natural Language Video Localization via Learning Domain-invariant Representations ACL 2023

Be Everywhere - Hear Everything (BEE): Audio Scene Reconstruction by Sparse Audio-Visual Samples ICCV 2023

Revealing Single Frame Bias for Video-and-Language Learning ACL 2023

Multimodal Persona Based Generation of Comic Dialogs ACL 2023

GliTr: Glimpse Transformers With Spatiotemporal Consistency for Online Action Prediction WACV 2023

MonoDVPS: A Self-Supervised Monocular Depth Estimation Approach to Depth-Aware Video Panoptic Segmentation WACV 2023

MARLIN: Masked Autoencoder for Facial Video Representation LearnINg CVPR 2023

Exposing the Self-Supervised Space-Time Correspondence Learning via Graph Kernels AAAI 2023

Efficient End-to-End Video Question Answering with Pyramidal Multimodal Transformer AAAI 2023

VADER: Video Alignment Differencing and Retrieval ICCV 2023

Isomer: Isomerous Transformer for Zero-shot Video Object Segmentation ICCV 2023

Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment ICCV 2023

Video Anomaly Detection via Sequentially Learning Multiple Pretext Tasks ICCV 2023

Video OWL-ViT: Temporally-consistent Open-world Localization in Video ICCV 2023

DCVNet: Dilated Cost Volume Networks for Fast Optical Flow WACV 2023

MASTAF: A Model-Agnostic Spatio-Temporal Attention Fusion Network for Few-Shot Video Classification WACV 2023

Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition ICCV 2023

AVE-CLIP: AudioCLIP-Based Multi-Window Temporal Transformer for Audio Visual Event Localization WACV 2023