Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Processing
Computer Vision
›
Processing
›
Video Understanding
1592 directly classified papers
Papers per year
2006: 1
2012: 1
2013: 30
2014: 15
2015: 38
2016: 22
2017: 39
2018: 49
2019: 91
2020: 115
2021: 207
2022: 160
2023: 254
2024: 216
2025: 297
2026: 57
Papers
Empowering Large Language Model for Continual Video Question Answering with Collaborative Prompting
EMNLP 2024
VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning
CVPR 2024
Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation
CVPR 2024
Reconsidering Sentence-Level Sign Language Translation
EMNLP 2024
Tri-Modal Motion Retrieval by Learning a Joint Embedding Space
CVPR 2024
Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection
CVPR 2024
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding
NIPS 2024
Context-Guided Spatio-Temporal Video Grounding
CVPR 2024
VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models
EMNLP 2024
CityPulse: Fine-Grained Assessment of Urban Change with Street View Time Series
AAAI 2024
VVS: Video-to-Video Retrieval with Irrelevant Frame Suppression
AAAI 2024
Motion-Aware Heatmap Regression for Human Pose Estimation in Videos
IJCAI 2024
N-gram Unsupervised Compoundation and Feature Injection for Better Symbolic Music Understanding
AAAI 2024
Local-Global Multi-Modal Distillation for Weakly-Supervised Temporal Video Grounding
AAAI 2024
Context Enhanced Transformer for Single Image Object Detection in Video Data
AAAI 2024
No More Shortcuts: Realizing the Potential of Temporal Self-Supervision
AAAI 2024
Can't Make an Omelette Without Breaking Some Eggs: Plausible Action Anticipation Using Large Video-Language Models
CVPR 2024
Fewer Steps, Better Performance: Efficient Cross-Modal Clip Trimming for Video Moment Retrieval Using Language
AAAI 2024
FineSports: A Multi-person Hierarchical Sports Video Dataset for Fine-grained Action Understanding
CVPR 2024
Contrastive Transformer Cross-Modal Hashing for Video-Text Retrieval
IJCAI 2024
SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge
CVPR 2024
Comprehensive Visual Grounding for Video Description
AAAI 2024
Chronologically Accurate Retrieval for Temporal Grounding of Motion-Language Models
ECCV 2024
Segment Any Change
NIPS 2024
Movie Genre Classification by Language Augmentation and Shot Sampling
WACV 2024
<
1
…
20
21
22
…
64
>