Yali Wang
55 papers · 2012–2026 · 9 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+14 more ↓ Show less ↑
π Conference Polyglot (9) π Academic Marathon (13) π§ Keyword Pioneer π Interdisciplinary Bridge π£ Hot Topic Early Bird
π§
Keyword Pioneer
π
Cross-Pollinator
(15)
πΊοΈ
Taxonomy Completionist
(74)
π€
Dynamic Duo
(40)
π
Triple Crown
π
Grand Slam
π₯
Mega-Team
(38)
π§¬
Topic Evolution
π
Keyword Champion
π
Conference Pioneer
ποΈ
Keyword Collector
(190)
β‘
Prolific Year
(10)
π₯
Unstoppable
(10)
π
Century Club
(51)
Conferences
CVPR (17)
AAAI (10)
ICLR (9)
ICCV (7)
ECCV (5)
ICML (3)
NIPS (2)
AISTATS (1)
IJCAI (1)
Top co-authors
Keywords
video understanding
(6)
multimodal learning
(6)
multi-agent system
(5)
large language model
(5)
object detection
(4)
self-supervised learning
(4)
action recognition
(3)
knowledge distillation
(3)
few-shot learning
(3)
domain adaptation
(3)
reinforcement learning
(3)
representation learning
(2)
temporal modeling
(2)
semantic segmentation
(2)
autonomous driving
(2)
zero-shot learning
(2)
video recognition
(2)
pose estimation
(2)
multi-task learning
(2)
vision transformer
(2)
Papers
VideoChat-A1: Thinking with Long Videos by Chain-of-Shot Reasoning
AAAI 2026
VRAgent-R1: Boosting Video Recommendation with MLLM-based Agents via Reinforcement Learning
AAAI 2026
When Top-ranked Recommendations Fail: Modeling Multi-Granular Negative Feedback for Explainable and Robust Video Recommendation
AAAI 2026
G-UBS: Towards Robust Understanding of Implicit Feedback via Group-Aware User Behavior Simulation
AAAI 2026
VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos
ICCV 2025
LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents
ICCV 2025
H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving
AAAI 2025
Muses: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration
AAAI 2025
TimeStep Master: Asymmetrical Mixture of Timestep LoRA Experts for Versatile and Efficient Diffusion Models in Vision
ICML 2025
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
ICLR 2025
Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel
ICLR 2025
CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding
ICLR 2025
Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning
ICLR 2025
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
ICLR 2025
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
CVPR 2025
WeGen: A Unified Model for Interactive Multimodal Generation as We Chat
CVPR 2025
V-Stylist: Video Stylization via Collaboration and Reflection of MLLM Agents
CVPR 2025
TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration
NIPS 2024
M-BEV: Masked BEV Perception for Robust Autonomous Driving
AAAI 2024
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
ECCV 2024
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
ICML 2024
VideoMamba: State Space Model for Efficient Video Understanding
ECCV 2024
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
CVPR 2024
Vlogger: Make Your Dream A Vlog
CVPR 2024
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World
CVPR 2024
SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction
ICLR 2024
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
ICLR 2024
UniFormerV2: Unlocking the Potential of Image ViTs for Video Understanding
ICCV 2023
VideoMAE V2: Scaling Video Masked Autoencoders With Dual Masking
CVPR 2023
Starting From Non-Parametric Networks for 3D Point Cloud Analysis
CVPR 2023
MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling With Informative-Preserved Reconstruction and Self-Distilled Consistency
CVPR 2023
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
ICCV 2023
HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation
ICCV 2023
Cross Domain Object Detection by Target-Perceived Dual Branch Distillation
CVPR 2022
Target-Relevant Knowledge Preservation for Multi-Source Domain Adaptive Object Detection
CVPR 2022
UniFormer: Unified Transformer for Efficient Spatial-Temporal Representation Learning
ICLR 2022
MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning
ECCV 2022
Dual-AI: Dual-Path Actor Interaction Learning for Group Activity Recognition
CVPR 2022
Self-Slimmed Vision Transformer
ECCV 2022
Digging Into Uncertainty in Self-Supervised Multi-View Stereo
ICCV 2021
PC-HMR: Pose Calibration for 3D Human Mesh Recovery from 2D Images/Videos
AAAI 2021
CT-Net: Channel Tensorization Network for Video Classification
ICLR 2021
SmallBigNet: Integrating Core and Contextual Views for Video Classification
CVPR 2020
Learning Attentive Pairwise Interaction for Fine-Grained Classification
AAAI 2020
Context-Transformer: Tackling Object Confusion for Few-Shot Detection
AAAI 2020
Mining Inter-Video Proposal Relations for Video Object Detection
ECCV 2020
Adaptive Pyramid Context Network for Semantic Segmentation
CVPR 2019
PA3D: Pose-Action 3D Machine for Video Recognition
CVPR 2019
MetaCleaner: Learning to Hallucinate Clean Representations for Noisy-Labeled Visual Recognition
CVPR 2019
Temporal Hallucinating for Action Recognition With Few Still Images
CVPR 2018
RPAN: An End-To-End Recurrent Pose-Attention Network for Action Recognition in Videos
ICCV 2017
Sequential Inference for Deep Gaussian Process
AISTATS 2016
Gaussian Processes for Bayesian Estimation in Ordinary Differential Equations
ICML 2014
A KNN Based Kalman Filter Gaussian Process Regression
IJCAI 2013
A Marginalized Particle Gaussian Process Regression
NIPS 2012