Computer Vision › Processing ›

Video Understanding

1592 directly classified papers

Papers per year

Papers

GRAVO: Learning to Generate Relevant Audio from Visual Features with Noisy Online Videos INTERSPEECH 2023

Video Anomaly Detection via Sequentially Learning Multiple Pretext Tasks ICCV 2023

Mulan: A Multi-Level Alignment Model for Video Question Answering EMNLP 2023

VindLU: A Recipe for Effective Video-and-Language Pretraining CVPR 2023

Video OWL-ViT: Temporally-consistent Open-world Localization in Video ICCV 2023

Multimodal Turn-Taking Model Using Visual Cues for End-of-Utterance Prediction in Spoken Dialogue Systems INTERSPEECH 2023

Lecture Presentations Multimodal Dataset: Towards Understanding Multimodality in Educational Videos ICCV 2023

TCOVIS: Temporally Consistent Online Video Instance Segmentation ICCV 2023

Tem-Adapter: Adapting Image-Text Pretraining for Video Question Answer ICCV 2023

Progressive Spatio-Temporal Prototype Matching for Text-Video Retrieval ICCV 2023

Tubelet-Contrastive Self-Supervision for Video-Efficient Generalization ICCV 2023

Contrastive Learning for Sign Language Recognition and Translation IJCAI 2023

Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition ICCV 2023

Towards Accurate Video Text Spotting with Text-wise Semantic Reasoning IJCAI 2023

What Can Simple Arithmetic Operations Do for Temporal Modeling? ICCV 2023

Event-Specific Audio-Visual Fusion Layers: A Simple and New Perspective on Video Understanding WACV 2023

Video Summarization Leveraging Multimodal Information for Presentations INTERSPEECH 2023

CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition With Variational Alignment CVPR 2023

CTVIS: Consistent Training for Online Video Instance Segmentation ICCV 2023

Spectrum-guided Multi-granularity Referring Video Object Segmentation ICCV 2023

Action Sensitivity Learning for Temporal Action Localization ICCV 2023

Two Birds, One Stone: A Unified Framework for Joint Learning of Image and Video Style Transfers ICCV 2023

Motion-Guided Masking for Spatiotemporal Representation Learning ICCV 2023

An Empirical Study of Frame Selection for Text-to-Video Retrieval EMNLP 2023

LVOS: A Benchmark for Long-term Video Object Segmentation ICCV 2023