Computer Vision › Processing ›

Video Understanding

1592 directly classified papers

Papers per year

Papers

LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant CVPR 2025

Beyond Single Frames: Can LMMs Comprehend Implicit Narratives in Comic Strip? EMNLP 2025

Guess Future Anomalies from Normalcy: Forecasting Abnormal Behavior in Real-World Videos WACV 2025

Language-Guided Audio-Visual Learning for Long-Term Sports Assessment CVPR 2025

Scaling Action Detection: AdaTAD++ with Transformer-Enhanced Temporal-Spatial Adaptation ICCV 2025

Enrich and Detect: Video Temporal Grounding with Multimodal LLMs ICCV 2025

Flow4Agent: Long-form Video Understanding via Motion Prior from Optical Flow ICCV 2025

Prototypes are Balanced Units for Efficient and Effective Partially Relevant Video Retrieval ICCV 2025

Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation ICCV 2025

ESSENTIAL: Episodic and Semantic Memory Integration for Video Class-Incremental Learning ICCV 2025

HyperGLM: HyperGraph for Video Scene Graph Generation and Anticipation CVPR 2025

Alignment, Mining and Fusion: Representation Alignment with Hard Negative Mining and Selective Knowledge Fusion for Medical Visual Question Answering CVPR 2025

Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models CVPR 2025

Flexible Frame Selection for Efficient Video Reasoning CVPR 2025

VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation CVPR 2025

Seeing Speech and Sound: Distinguishing and Locating Audio Sources in Visual Scenes CVPR 2025

OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts CVPR 2025

Efficient Motion-Aware Video MLLM CVPR 2025

DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding CVPR 2025

MDIT-Bench: Evaluating the Dual-Implicit Toxicity in Large Multimodal Models ACL 2025

TACO: Taming Diffusion for in-the-wild Video Amodal Completion ICCV 2025

Towards Real-Time Open-Vocabulary Video Instance Segmentation WACV 2025

Reasoning is All You Need for Video Generalization: A Counterfactual Benchmark with Sub-question Evaluation ACL 2025

ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic Reconstruction WACV 2025

Exploiting Multimodal Spatial-temporal Patterns for Video Object Tracking AAAI 2025