Limin Wang
111 papers · 2013–2026 · 10 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+17 more ↓ Show less ↑
🌍 Conference Polyglot (10) 🏃 Academic Marathon (12) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (13)
🧭
Keyword Pioneer
🐝
Cross-Pollinator
(13)
🌍
Conference Polyglot
(10)
🏠
Conference Loyalist
(26)
🤝
Dynamic Duo
(30)
🏆
Grand Slam
👥
Mega-Team
(38)
🔬
Deep Specialist
(26)
🧬
Topic Evolution
🏆
Keyword Champion
(19)
❓
The Questioner
📈
Trend Setter
🗃️
Keyword Collector
(391)
🔥
Unstoppable
(13)
⚡
Prolific Year
(9)
💎
Century Club
(109)
🚀
Conference Pioneer
Conferences
CVPR (42)
ICCV (26)
ECCV (11)
ICLR (10)
AAAI (9)
NIPS (8)
ICML (2)
ACL (1)
IJCAI (1)
WACV (1)
Top co-authors
Keywords
video understanding
(29)
action recognition
(19)
convolutional neural network
(9)
self-supervised learning
(7)
object detection
(7)
neural network
(6)
representation learning
(6)
temporal modeling
(6)
vision transformer
(6)
query-based detection
(5)
masked autoencoder
(5)
knowledge distillation
(5)
video recognition
(5)
action detection
(5)
multimodal learning
(5)
video representation
(4)
temporal action detection
(4)
diffusion model
(4)
transfer learning
(4)
foundation model
(4)
Papers
Flowing Backwards: Improving Normalizing Flows via Reverse Representation Alignment
AAAI 2026
VideoChat-A1: Thinking with Long Videos by Chain-of-Shot Reasoning
AAAI 2026
Multiple Object Tracking as ID Prediction
CVPR 2025
Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning
CVPR 2025
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
CVPR 2025
Transferring Foundation Models for Generalizable Robotic Manipulation
WACV 2025
Differentiable Solver Search for Fast Diffusion Sampling
ICML 2025
Stochastic Layer-Wise Shuffle for Improving Vision Mamba Training
ICML 2025
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
ICLR 2025
SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
ICLR 2025
Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel
ICLR 2025
CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding
ICLR 2025
Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning
ICLR 2025
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
ICLR 2025
Scalable Image Tokenization with Index Backpropagation Quantization
ICCV 2025
VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos
ICCV 2025
p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay
ICCV 2025
MobileViCLIP: An Efficient Video-Text Model for Mobile Devices
ICCV 2025
Make Your Training Flexible: Towards Deployment-Efficient Video Models
ICCV 2025
Contextual AD Narration with Interleaved Multimodal Sequence
CVPR 2025
LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis
CVPR 2025
Online Video Understanding: OVBench and VideoChat-Online
CVPR 2025
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
ICLR 2024
AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation
NIPS 2024
Does Video-Text Pretraining Help Open-Vocabulary Online Action Detection?
NIPS 2024
Exploring DCN-like architecture for fast image generation with arbitrary resolution
NIPS 2024
VFIMamba: Video Frame Interpolation with State Space Models
NIPS 2024
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
CVPR 2024
Dual DETRs for Multi-Label Temporal Action Detection
CVPR 2024
BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models
CVPR 2024
SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos
CVPR 2024
VBench: Comprehensive Benchmark Suite for Video Generative Models
CVPR 2024
Asymmetric Masked Distillation for Pre-Training Small Foundation Models
CVPR 2024
Sparse Global Matching for Video Frame Interpolation with Large Motion
CVPR 2024
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World
CVPR 2024
Adapting Short-Term Transformers for Action Detection in Untrimmed Videos
CVPR 2024
Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering
CVPR 2024
Fully Sparse 3D Occupancy Prediction
ECCV 2024
VideoMamba: State Space Model for Efficient Video Understanding
ECCV 2024
Accelerating Image Generation with Sub-path Linear Approximation Model
ECCV 2024
StableDrag: Stable Dragging for Point-based Image Editing
ECCV 2024
ZeroI2V: Zero-Cost Adaptation of Pre-Trained Transformers from Image to Video
ECCV 2024
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
ECCV 2024
SparseFormer: Sparse Visual Recognition via Limited Latent Tokens
ICLR 2024
SportsMOT: A Large Multi-Object Tracking Dataset in Multiple Sports Scenes
ICCV 2023
MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object Tracking
ICCV 2023
StageInteractor: Query-based Object Detector with Cross-stage Interaction
ICCV 2023
LinK: Linear Kernel for LiDAR-Based 3D Perception
CVPR 2023
MixFormerV2: Efficient Fully Transformer Tracking
NIPS 2023
JourneyDB: A Benchmark for Generative Image Understanding
NIPS 2023
Efficient Video Action Detection with Token Dropout and Context Refinement
ICCV 2023
STMixer: A One-Stage Sparse Action Detector
CVPR 2023
PDPP:Projected Diffusion for Procedure Planning in Instructional Videos
CVPR 2023
MGMAE: Motion Guided Masking for Video Masked Autoencoding
ICCV 2023
UniFormerV2: Unlocking the Potential of Image ViTs for Video Understanding
ICCV 2023
Memory-and-Anticipation Transformer for Online Action Understanding
ICCV 2023
VideoMAE V2: Scaling Video Masked Autoencoders With Dual Masking
CVPR 2023
Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation
CVPR 2023
SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera Videos
ICCV 2023
Filter-Recovery Network for Multi-Speaker Audio-Visual Speech Separation
ICLR 2023
CoMAE: Single Model Hybrid Pre-training on Small-Scale RGB-D Datasets
AAAI 2023
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
ICCV 2023
Deep Equilibrium Object Detection
ICCV 2023
AdaMixer: A Fast-Converging Query-Based Object Detector
CVPR 2022
Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary Detection
CVPR 2022
PointTAD: Multi-Label Temporal Action Detection with Learnable Query Points
NIPS 2022
MixFormer: End-to-End Tracking With Iterative Mixed Attention
CVPR 2022
Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding
AAAI 2022
Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video Parsing
ECCV 2022
DCAN: Improving Temporal Action Detection via Dual Context Aggregation
AAAI 2022
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
NIPS 2022
Task-Specific Inconsistency Alignment for Domain Adaptive Object Detection
CVPR 2022
Structured Sparse R-CNN for Direct Scene Graph Generation
CVPR 2022
Cross-Architecture Self-Supervised Video Representation Learning
CVPR 2022
OCSampler: Compressing Videos to One Clip With Single-Step Sampling
CVPR 2022
MGSampler: An Explainable Sampling Strategy for Video Action Recognition
ICCV 2021
PyMAF: 3D Human Pose and Shape Regression With Pyramidal Mesh Alignment Feedback Loop
ICCV 2021
Self Supervision to Distillation for Long-Tailed Visual Recognition
ICCV 2021
MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions
ICCV 2021
Relaxed Transformer Decoders for Direct Action Proposal Generation
ICCV 2021
Mutual Supervision for Dense Object Detection
ICCV 2021
Target Adaptive Context Aggregation for Video Scene Graph Generation
ICCV 2021
TAM: Temporal Adaptive Module for Video Recognition
ICCV 2021
TDN: Temporal Difference Networks for Efficient Action Recognition
CVPR 2021
CGA-Net: Category Guided Aggregation for Point Cloud Semantic Segmentation
CVPR 2021
V4D: 4D Convolutional Neural Networks for Video-level Representation Learning
ICLR 2020
Knowledge Integration Networks for Action Recognition
AAAI 2020
TEA: Temporal Excitation and Aggregation for Action Recognition
CVPR 2020
Actions as Moving Points
ECCV 2020
Boundary-Aware Cascade Networks for Temporal Action Segmentation
ECCV 2020
Context-Aware RCNN: A Baseline for Action Detection in Videos
ECCV 2020
TEINet: Towards an Efficient Architecture for Video Recognition
AAAI 2020
Finding Action Tubes with a Sparse-to-Dense Framework
AAAI 2020
SketchyCOCO: Image Generation From Freehand Scene Sketches
CVPR 2020
Learning Actor Relation Graphs for Group Activity Recognition
CVPR 2019
LIP: Local Importance-Based Pooling
ICCV 2019
Translate-to-Recognize Networks for RGB-D Scene Recognition
CVPR 2019
Dynamically Visual Disambiguation of Keyword-based Image Search
IJCAI 2019
StNet: Local and Global Spatial-Temporal Modeling for Action Recognition
AAAI 2019
Appearance-and-Relation Networks for Video Classification
CVPR 2018
Single Image Highlight Removal with a Sparse and Low-Rank Reflection Model
ECCV 2018
UntrimmedNets for Weakly Supervised Action Recognition and Detection
CVPR 2017
Temporal Action Detection With Structured Segment Networks
ICCV 2017
Thin-Slicing Network: A Deep Structured Model for Pose Estimation in Videos
CVPR 2017
Actionness Estimation Using Hybrid Fully Convolutional Networks
CVPR 2016
Real-Time Action Recognition With Enhanced Motion Vector CNNs
CVPR 2016
Action Recognition With Trajectory-Pooled Deep-Convolutional Descriptors
CVPR 2015
Multi-View Super Vector for Action Recognition
CVPR 2014
Mining Motion Atoms and Phrases for Complex Action Recognition
ICCV 2013
Motionlets: Mid-level 3D Parts for Human Motion Recognition
CVPR 2013
PAL: A Chatterbot System for Answering Domain-specific Questions
ACL 2013