Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Processing
Computer Vision
›
Processing
›
Video Understanding
1592 directly classified papers
Papers per year
2006: 1
2012: 1
2013: 30
2014: 15
2015: 38
2016: 22
2017: 39
2018: 49
2019: 91
2020: 115
2021: 207
2022: 160
2023: 254
2024: 216
2025: 297
2026: 57
Papers
Multimodal Class-aware Semantic Enhancement Network for Audio-Visual Video Parsing
AAAI 2025
VE-Bench: Subjective-Aligned Benchmark Suite for Text-Driven Video Editing Quality Assessment
AAAI 2025
ProtoVQA: An Adaptable Prototypical Framework for Explainable Fine-Grained Visual Question Answering
EMNLP 2025
Towards Universal Soccer Video Understanding
CVPR 2025
SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos
CVPR 2024
TE-TAD: Towards Full End-to-End Temporal Action Detection via Time-Aligned Coordinate Expression
CVPR 2024
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
INTERSPEECH 2024
Video-Based Human Pose Regression via Decoupled Space-Time Aggregation
CVPR 2024
MaViLS, a Benchmark Dataset for Video-to-Slide Alignment, Assessing Baseline Accuracy with a Multimodal Alignment Algorithm Leveraging Speech, OCR, and Visual Features
INTERSPEECH 2024
Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation
CVPR 2024
Bring Event into RGB and LiDAR: Hierarchical Visual-Motion Fusion for Scene Flow
CVPR 2024
Arbitrary Motion Style Transfer with Multi-condition Motion Latent Diffusion Model
CVPR 2024
Learning Object State Changes in Videos: An Open-World Perspective
CVPR 2024
Video Token Merging for Long Video Understanding
NIPS 2024
Enhanced Motion-Text Alignment for Image-to-Video Transfer Learning
CVPR 2024
A multimodal analysis of different types of laughter expression in conversational dialogues
INTERSPEECH 2024
Streaming Dense Video Captioning
CVPR 2024
R^2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
ECCV 2024
Endow SAM with Keen Eyes: Temporal-spatial Prompt Learning for Video Camouflaged Object Detection
CVPR 2024
Reconsidering Sentence-Level Sign Language Translation
EMNLP 2024
What's in the Flow? Exploiting Temporal Motion Cues for Unsupervised Generic Event Boundary Detection
WACV 2024
Implicit Motion Function
CVPR 2024
Lighthouse: A User-Friendly Library for Reproducible Video Moment Retrieval and Highlight Detection
EMNLP 2024
Semantic Fusion Augmentation and Semantic Boundary Detection: A Novel Approach to Multi-Target Video Moment Retrieval
WACV 2024
Empowering Large Language Model for Continual Video Question Answering with Collaborative Prompting
EMNLP 2024
<
1
…
14
15
16
…
64
>