Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Processing
Computer Vision
›
Processing
›
Video Understanding
1592 directly classified papers
Papers per year
2006: 1
2012: 1
2013: 30
2014: 15
2015: 38
2016: 22
2017: 39
2018: 49
2019: 91
2020: 115
2021: 207
2022: 160
2023: 254
2024: 216
2025: 297
2026: 57
Papers
Fewer Steps, Better Performance: Efficient Cross-Modal Clip Trimming for Video Moment Retrieval Using Language
AAAI 2024
Video Discourse Parsing and Its Application to Multimodal Summarization: A Dataset and Baseline Approaches
EMNLP 2024
MMAR: Multilingual and Multimodal Anaphora Resolution in Instructional Videos
EMNLP 2024
VIEWS: Entity-Aware News Video Captioning
EMNLP 2024
GMMFormer: Gaussian-Mixture-Model Based Transformer for Efficient Partially Relevant Video Retrieval
AAAI 2024
DTF-AT: Decoupled Time-Frequency Audio Transformer for Event Classification
AAAI 2024
Towards Answering Health-related Questions from Medical Videos: Datasets and Approaches
COLING 2024
Multi-View Dynamic Reflection Prior for Video Glass Surface Detection
AAAI 2024
DiffusionTrack: Diffusion Model for Multi-Object Tracking
AAAI 2024
Video-Based Human Pose Regression via Decoupled Space-Time Aggregation
CVPR 2024
TD²-Net: Toward Denoising and Debiasing for Video Scene Graph Generation
AAAI 2024
TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection
AAAI 2024
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation
AAAI 2024
Context Enhanced Transformer for Single Image Object Detection in Video Data
AAAI 2024
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
INTERSPEECH 2024
MaViLS, a Benchmark Dataset for Video-to-Slide Alignment, Assessing Baseline Accuracy with a Multimodal Alignment Algorithm Leveraging Speech, OCR, and Visual Features
INTERSPEECH 2024
Video-Text Prompting for Weakly Supervised Spatio-Temporal Video Grounding
EMNLP 2024
A multimodal analysis of different types of laughter expression in conversational dialogues
INTERSPEECH 2024
Diving Deep into the Motion Representation of Video-Text Models
ACL 2024
TE-TAD: Towards Full End-to-End Temporal Action Detection via Time-Aligned Coordinate Expression
CVPR 2024
SAVSR: Arbitrary-Scale Video Super-Resolution via a Learned Scale-Adaptive Network
AAAI 2024
Streaming Dense Video Captioning
CVPR 2024
Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation
CVPR 2024
Collaborative Weakly Supervised Video Correlation Learning for Procedure-Aware Instructional Video Analysis
AAAI 2024
Frame2: A FrameNet-based Multimodal Dataset for Tackling Text-image Interactions in Video
COLING 2024
<
1
…
21
22
23
…
64
>