Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Processing
Computer Vision
›
Processing
›
Video Understanding
1592 directly classified papers
Papers per year
2006: 1
2012: 1
2013: 30
2014: 15
2015: 38
2016: 22
2017: 39
2018: 49
2019: 91
2020: 115
2021: 207
2022: 160
2023: 254
2024: 216
2025: 297
2026: 57
Papers
Benchmarking Audio Visual Segmentation for Long-Untrimmed Videos
CVPR 2024
Local-Global Multi-Modal Distillation for Weakly-Supervised Temporal Video Grounding
AAAI 2024
No More Shortcuts: Realizing the Potential of Temporal Self-Supervision
AAAI 2024
Context Enhanced Transformer for Single Image Object Detection in Video Data
AAAI 2024
Fewer Steps, Better Performance: Efficient Cross-Modal Clip Trimming for Video Moment Retrieval Using Language
AAAI 2024
Comprehensive Visual Grounding for Video Description
AAAI 2024
Realigning Confidence with Temporal Saliency Information for Point-Level Weakly-Supervised Temporal Action Localization
CVPR 2024
OVER-NAV: Elevating Iterative Vision-and-Language Navigation with Open-Vocabulary Detection and StructurEd Representation
CVPR 2024
VideoMAC: Video Masked Autoencoders Meet ConvNets
CVPR 2024
Video Discourse Parsing and Its Application to Multimodal Summarization: A Dataset and Baseline Approaches
EMNLP 2024
GMMFormer: Gaussian-Mixture-Model Based Transformer for Efficient Partially Relevant Video Retrieval
AAAI 2024
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation
AAAI 2024
CALVIN: Improved Contextual Video Captioning via Instruction Tuning
NIPS 2024
T-VSL: Text-Guided Visual Sound Source Localization in Mixtures
CVPR 2024
MMAR: Multilingual and Multimodal Anaphora Resolution in Instructional Videos
EMNLP 2024
Temporally-Consistent Video Semantic Segmentation With Bidirectional Occlusion-Guided Feature Propagation
WACV 2024
DTF-AT: Decoupled Time-Frequency Audio Transformer for Event Classification
AAAI 2024
Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective
ACL 2024
Point-VOS: Pointing Up Video Object Segmentation
CVPR 2024
SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos
CVPR 2024
Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation
CVPR 2024
MadEye: Boosting Live Video Analytics Accuracy with Adaptive Camera Configurations
NSDI 2024
TE-TAD: Towards Full End-to-End Temporal Action Detection via Time-Aligned Coordinate Expression
CVPR 2024
TD²-Net: Toward Denoising and Debiasing for Video Scene Graph Generation
AAAI 2024
VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning
CVPR 2024
<
1
…
19
20
21
…
64
>