conftrace_

Limin Wang

111 papers · 2013–2026 · 10 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+17 more ↓

🌍 Conference Polyglot (10) 🏃 Academic Marathon (12) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (13)

🧭 Keyword Pioneer 🐝 Cross-Pollinator (13) 🌍 Conference Polyglot (10) 🏠 Conference Loyalist (26) 🤝 Dynamic Duo (30) 🏆 Grand Slam 👥 Mega-Team (38) 🔬 Deep Specialist (26) 🧬 Topic Evolution 🏆 Keyword Champion (19) ❓ The Questioner 📈 Trend Setter 🗃️ Keyword Collector (391) 🔥 Unstoppable (13) ⚡ Prolific Year (9) 💎 Century Club (109) 🚀 Conference Pioneer

Conferences

CVPR (42) ICCV (26) ECCV (11) ICLR (10) AAAI (9) NIPS (8) ICML (2) ACL (1) IJCAI (1) WACV (1)

Top co-authors

Gangshan Wu (30) Yu Qiao (26) Yali Wang (16) Yi Wang (16) Yinan He (12) Kunchang Li (10) Guo Chen (8) Sheng Guo (8) Zhan Tong (7) Xinhao Li (7)

Keywords

video understanding (29) action recognition (19) convolutional neural network (9) self-supervised learning (7) object detection (7) neural network (6) representation learning (6) temporal modeling (6) vision transformer (6) query-based detection (5) masked autoencoder (5) knowledge distillation (5) video recognition (5) action detection (5) multimodal learning (5) video representation (4) temporal action detection (4) diffusion model (4) transfer learning (4) foundation model (4)

Papers

Flowing Backwards: Improving Normalizing Flows via Reverse Representation Alignment AAAI 2026 VideoChat-A1: Thinking with Long Videos by Chain-of-Shot Reasoning AAAI 2026 Multiple Object Tracking as ID Prediction CVPR 2025 Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning CVPR 2025 Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment CVPR 2025 Transferring Foundation Models for Generalizable Robotic Manipulation WACV 2025 Differentiable Solver Search for Fast Diffusion Sampling ICML 2025 Stochastic Layer-Wise Shuffle for Improving Vision Mamba Training ICML 2025 TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning ICLR 2025 SPA: 3D Spatial-Awareness Enables Effective Embodied Representation ICLR 2025 Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel ICLR 2025 CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding ICLR 2025 Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning ICLR 2025 OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text ICLR 2025 Scalable Image Tokenization with Index Backpropagation Quantization ICCV 2025 VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos ICCV 2025 p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay ICCV 2025 MobileViCLIP: An Efficient Video-Text Model for Mobile Devices ICCV 2025 Make Your Training Flexible: Towards Deployment-Efficient Video Models ICCV 2025 Contextual AD Narration with Interleaved Multimodal Sequence CVPR 2025 LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis CVPR 2025 Online Video Understanding: OVBench and VideoChat-Online CVPR 2025 InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation ICLR 2024 AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation NIPS 2024 Does Video-Text Pretraining Help Open-Vocabulary Online Action Detection? NIPS 2024 Exploring DCN-like architecture for fast image generation with arbitrary resolution NIPS 2024 VFIMamba: Video Frame Interpolation with State Space Models NIPS 2024 MVBench: A Comprehensive Multi-modal Video Understanding Benchmark CVPR 2024 Dual DETRs for Multi-Label Temporal Action Detection CVPR 2024 BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models CVPR 2024 SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos CVPR 2024 VBench: Comprehensive Benchmark Suite for Video Generative Models CVPR 2024 Asymmetric Masked Distillation for Pre-Training Small Foundation Models CVPR 2024 Sparse Global Matching for Video Frame Interpolation with Large Motion CVPR 2024 EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World CVPR 2024 Adapting Short-Term Transformers for Action Detection in Untrimmed Videos CVPR 2024 Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering CVPR 2024 Fully Sparse 3D Occupancy Prediction ECCV 2024 VideoMamba: State Space Model for Efficient Video Understanding ECCV 2024 Accelerating Image Generation with Sub-path Linear Approximation Model ECCV 2024 StableDrag: Stable Dragging for Point-based Image Editing ECCV 2024 ZeroI2V: Zero-Cost Adaptation of Pre-Trained Transformers from Image to Video ECCV 2024 InternVideo2: Scaling Foundation Models for Multimodal Video Understanding ECCV 2024 SparseFormer: Sparse Visual Recognition via Limited Latent Tokens ICLR 2024 SportsMOT: A Large Multi-Object Tracking Dataset in Multiple Sports Scenes ICCV 2023 MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object Tracking ICCV 2023 StageInteractor: Query-based Object Detector with Cross-stage Interaction ICCV 2023 LinK: Linear Kernel for LiDAR-Based 3D Perception CVPR 2023 MixFormerV2: Efficient Fully Transformer Tracking NIPS 2023 JourneyDB: A Benchmark for Generative Image Understanding NIPS 2023 Efficient Video Action Detection with Token Dropout and Context Refinement ICCV 2023 STMixer: A One-Stage Sparse Action Detector CVPR 2023 PDPP:Projected Diffusion for Procedure Planning in Instructional Videos CVPR 2023 MGMAE: Motion Guided Masking for Video Masked Autoencoding ICCV 2023 UniFormerV2: Unlocking the Potential of Image ViTs for Video Understanding ICCV 2023 Memory-and-Anticipation Transformer for Online Action Understanding ICCV 2023 VideoMAE V2: Scaling Video Masked Autoencoders With Dual Masking CVPR 2023 Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation CVPR 2023 SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera Videos ICCV 2023 Filter-Recovery Network for Multi-Speaker Audio-Visual Speech Separation ICLR 2023 CoMAE: Single Model Hybrid Pre-training on Small-Scale RGB-D Datasets AAAI 2023 Unmasked Teacher: Towards Training-Efficient Video Foundation Models ICCV 2023 Deep Equilibrium Object Detection ICCV 2023 AdaMixer: A Fast-Converging Query-Based Object Detector CVPR 2022 Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary Detection CVPR 2022 PointTAD: Multi-Label Temporal Action Detection with Learnable Query Points NIPS 2022 MixFormer: End-to-End Tracking With Iterative Mixed Attention CVPR 2022 Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding AAAI 2022 Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video Parsing ECCV 2022 DCAN: Improving Temporal Action Detection via Dual Context Aggregation AAAI 2022 VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training NIPS 2022 Task-Specific Inconsistency Alignment for Domain Adaptive Object Detection CVPR 2022 Structured Sparse R-CNN for Direct Scene Graph Generation CVPR 2022 Cross-Architecture Self-Supervised Video Representation Learning CVPR 2022 OCSampler: Compressing Videos to One Clip With Single-Step Sampling CVPR 2022 MGSampler: An Explainable Sampling Strategy for Video Action Recognition ICCV 2021 PyMAF: 3D Human Pose and Shape Regression With Pyramidal Mesh Alignment Feedback Loop ICCV 2021 Self Supervision to Distillation for Long-Tailed Visual Recognition ICCV 2021 MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions ICCV 2021 Relaxed Transformer Decoders for Direct Action Proposal Generation ICCV 2021 Mutual Supervision for Dense Object Detection ICCV 2021 Target Adaptive Context Aggregation for Video Scene Graph Generation ICCV 2021 TAM: Temporal Adaptive Module for Video Recognition ICCV 2021 TDN: Temporal Difference Networks for Efficient Action Recognition CVPR 2021 CGA-Net: Category Guided Aggregation for Point Cloud Semantic Segmentation CVPR 2021 V4D: 4D Convolutional Neural Networks for Video-level Representation Learning ICLR 2020 Knowledge Integration Networks for Action Recognition AAAI 2020 TEA: Temporal Excitation and Aggregation for Action Recognition CVPR 2020 Actions as Moving Points ECCV 2020 Boundary-Aware Cascade Networks for Temporal Action Segmentation ECCV 2020 Context-Aware RCNN: A Baseline for Action Detection in Videos ECCV 2020 TEINet: Towards an Efficient Architecture for Video Recognition AAAI 2020 Finding Action Tubes with a Sparse-to-Dense Framework AAAI 2020 SketchyCOCO: Image Generation From Freehand Scene Sketches CVPR 2020 Learning Actor Relation Graphs for Group Activity Recognition CVPR 2019 LIP: Local Importance-Based Pooling ICCV 2019 Translate-to-Recognize Networks for RGB-D Scene Recognition CVPR 2019 Dynamically Visual Disambiguation of Keyword-based Image Search IJCAI 2019 StNet: Local and Global Spatial-Temporal Modeling for Action Recognition AAAI 2019 Appearance-and-Relation Networks for Video Classification CVPR 2018 Single Image Highlight Removal with a Sparse and Low-Rank Reflection Model ECCV 2018 UntrimmedNets for Weakly Supervised Action Recognition and Detection CVPR 2017 Temporal Action Detection With Structured Segment Networks ICCV 2017 Thin-Slicing Network: A Deep Structured Model for Pose Estimation in Videos CVPR 2017 Actionness Estimation Using Hybrid Fully Convolutional Networks CVPR 2016 Real-Time Action Recognition With Enhanced Motion Vector CNNs CVPR 2016 Action Recognition With Trajectory-Pooled Deep-Convolutional Descriptors CVPR 2015 Multi-View Super Vector for Action Recognition CVPR 2014 Mining Motion Atoms and Phrases for Complex Action Recognition ICCV 2013 Motionlets: Mid-level 3D Parts for Human Motion Recognition CVPR 2013 PAL: A Chatterbot System for Answering Domain-specific Questions ACL 2013