Wanli Ouyang
234 papers · 2013–2026 · 12 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+20 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (16) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (5) π£ Hot Topic Early Bird
π
Renaissance Researcher
(5)
π
Interdisciplinary Bridge
π§
Keyword Pioneer
π
Keyword Trendsetter Combo
(7)
π
Conference Loyalist
(23)
π€
Dynamic Duo
(43)
π±
Topic Pioneer
π¬
Deep Specialist
(39)
π§¬
Topic Evolution
π
Keyword Champion
(10)
π
Triple Crown
π
Grand Slam
π₯
Mega-Team
(20)
π
Trend Setter
π
Century Club
(230)
β‘
Prolific Year
(20)
π₯
Unstoppable
(13)
β
The Questioner
(2)
ποΈ
Keyword Collector
(757)
π
Conference Pioneer
Conferences
CVPR (77)
ICCV (40)
AAAI (25)
ECCV (25)
NIPS (24)
ICLR (15)
ACL (10)
ICML (7)
IJCAI (4)
EMNLP (3)
WACV (3)
NAACL (1)
Top co-authors
Research topics
Keywords
convolutional neural network
(27)
object detection
(17)
large language model
(15)
point cloud
(13)
person re-identification
(12)
human pose estimation
(12)
neural architecture search
(11)
transfer learning
(10)
representation learning
(10)
self-supervised learning
(10)
feature extraction
(10)
3d object detection
(10)
neural network
(9)
semantic segmentation
(9)
feature representation
(8)
pose estimation
(8)
multimodal learning
(8)
image classification
(8)
autonomous driving
(8)
deep learning
(8)
Papers
ARCHE: A Novel Task to Evaluate LLMs on Latent Reasoning Chain Extraction
AAAI 2026
A Scalable Multi-LLM Collaboration System with Retrieval-based Selection and Exploration-Exploitation-Driven Enhancement
ACL 2026
Nature-Inspired Population-Based Evolution of Large Language Models
ACL 2026
Mitigating Low-Quality Reasoning in MLLMs: Self-Driven Refined Multimodal CoT with Selective Thinking and Step-wise Visual Enhancement
AAAI 2026
Depth Any Video with Scalable Synthetic Data
ICLR 2025
Neural Representational Consistency Emerges from Probabilistic Neural-Behavioral Representation Alignment
ICML 2025
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
CVPR 2025
UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines
CVPR 2025
ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems
CVPR 2025
Neuro-3D: Towards 3D Visual Decoding from EEG Signals
CVPR 2025
Satellite Observations Guided Diffusion Model for Accurate Meteorological States at Arbitrary Resolution
CVPR 2025
ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area
AAAI 2025
Multi-Modal Latent Variables for Cross-Individual Primary Visual Cortex Modeling and Analysis
AAAI 2025
GigaGS: 3D Gaussian Based Planar Representation for Large-Scene Surface Reconstruction
AAAI 2025
Towards Efficient and Intelligent Laser Weeding: Method and Dataset for Weed Stem Detection
AAAI 2025
Biology-Instructions: A Dataset and Benchmark for Multi-Omics Sequence Understanding Capability of Large Language Models
EMNLP 2025
EgoAgent: A Joint Predictive Agent Model in Egocentric Worlds
ICCV 2025
CMT: A Cascade MAR with Topology Predictor for Multimodal Conditional CAD Generation
ICCV 2025
MindAligner: Explicit Brain Functional Alignment for Cross-Subject Visual Decoding from Limited fMRI Data
ICML 2025
SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling
ICCV 2025
Dolphin: Moving Towards Closed-loop Auto-research through Thinking, Practice, and Feedback
ACL 2025
Many Heads Are Better Than One: Improved Scientific Idea Generation by A LLM-Based Multi-Agent System
ACL 2025
ROGRAG: A Robustly Optimized GraphRAG Framework
ACL 2025
WeatherGFM: Learning a Weather Generalist Foundation Model via In-context Learning
ICLR 2025
LLaMA-Berry: Pairwise Optimization for Olympiad-level Mathematical Reasoning via O1-like Monte Carlo Tree Search
NAACL 2025
Human-Centric Foundation Models: Perception, Generation and Agentic Modeling
IJCAI 2025
Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction
ICLR 2025
PostCast: Generalizable Postprocessing for Precipitation Nowcasting via Unsupervised Blurriness Modeling
ICLR 2025
TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction
ICCV 2025
MOOSE-Chem: Large Language Models for Rediscovering Unseen Chemistry Scientific Hypotheses
ICLR 2025
A CLIP-Powered Framework for Robust and Generalizable Data Selection
ICLR 2025
HiSplat: Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction
ICLR 2025
Dense Connector for MLLMs
NIPS 2024
DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion
NIPS 2024
Generalizing Weather Forecast to Fine-grained Temporal Scales via Physics-AI Hybrid Modeling
NIPS 2024
ProSST: Protein Language Modeling with Quantized Structure and Disentangled Attention
NIPS 2024
Empowering and Assessing the Utility of Large Language Models in Crop Science
NIPS 2024
Model Decides How to Tokenize: Adaptive DNA Sequence Tokenization with MxDNA
NIPS 2024
Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning
NIPS 2024
AFBench: A Large-scale Benchmark for Airfoil Design
NIPS 2024
BEACON: Benchmark for Comprehensive RNA Tasks and Language Models
NIPS 2024
NeuRodin: A Two-stage Framework for High-Fidelity Neural Surface Reconstruction
NIPS 2024
EMR-Merging: Tuning-Free High-Performance Model Merging
NIPS 2024
Lumina-Next : Making Lumina-T2X Stronger and Faster with Next-DiT
NIPS 2024
FNP: Fourier Neural Processes for Arbitrary-Resolution Data Assimilation
NIPS 2024
An Embarrassingly Simple Approach to Enhance Transformer Performance in Genomic Selection for Crop Breeding
IJCAI 2024
ContraNovo: A Contrastive Learning Approach to Enhance De Novo Peptide Sequencing
AAAI 2024
Frozen CLIP Transformer Is an Efficient Point Cloud Encoder
AAAI 2024
Boosting Residual Networks with Group Knowledge
AAAI 2024
Semi-supervised 3D Object Detection with PatchTeacher and PillarMix
AAAI 2024
MotionGPT: Finetuned LLMs Are General-Purpose Motion Generators
AAAI 2024
A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning
AAAI 2024
MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues
ACL 2024
Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!
ACL 2024
ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models
ACL 2024
Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization
ACL 2024
RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models
ACL 2024
Towards a Self-contained Data-driven Global Weather Forecasting Framework
ICML 2024
FiT: Flexible Vision Transformer for Diffusion Model
ICML 2024
CasCast: Skillful High-resolution Precipitation Nowcasting via Cascaded Modelling
ICML 2024
Octavius: Mitigating Task Interference in MLLMs via LoRA-MoE
ICLR 2024
GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models
EMNLP 2024
LOCR: Location-Guided Transformer for Optical Character Recognition
EMNLP 2024
DiffBIR: Toward Blind Image Restoration with Generative Diffusion Prior
ECCV 2024
PredBench: Benchmarking Spatio-Temporal Prediction across Diverse Disciplines
ECCV 2024
DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM
ECCV 2024
Agent3D-Zero: An Agent for Zero-shot 3D Understanding
ECCV 2024
GVGEN: Text-to-3D Generation with Volumetric Representation
ECCV 2024
UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation
ECCV 2024
Point Cloud Pre-training with Diffusion Models
CVPR 2024
Point Transformer V3: Simpler Faster Stronger
CVPR 2024
Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions
CVPR 2024
UniPAD: A Universal Pre-training Paradigm for Autonomous Driving
CVPR 2024
TASeg: Temporal Aggregation Network for LiDAR Semantic Segmentation
CVPR 2024
Taming Stable Diffusion for Text to 360 Panorama Image Generation
CVPR 2024
Masked Motion Predictors are Strong 3D Action Representation Learners
ICCV 2023
STEERER: Resolving Scale Variations for Counting and Localization via Selective Inheritance Learning
ICCV 2023
NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates Space
ICCV 2023
CLIP2Point: Transfer CLIP to Point Cloud Classification with Image-Depth Pre-Training
ICCV 2023
Ponder: Point Cloud Pre-training via Neural Rendering
ICCV 2023
Learning Multi-Modal Class-Specific Tokens for Weakly Supervised Dense Object Localization
CVPR 2023
PVT-SSD: Single-Stage 3D Object Detector With Point-Voxel Transformer
CVPR 2023
GD-MAE: Generative Decoder for MAE Pre-Training on LiDAR Point Clouds
CVPR 2023
Crossing the Gap: Domain Generalization for Image Captioning
CVPR 2023
Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
CVPR 2023
MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling With Informative-Preserved Reconstruction and Self-Distilled Consistency
CVPR 2023
Learning to Parameterize Visual Attributes for Open-set Fine-grained Retrieval
NIPS 2023
CluB: Cluster Meets BEV for LiDAR-Based 3D Object Detection
NIPS 2023
Seeing is not always believing: Benchmarking Human and Model Perception of AI-Generated Images
NIPS 2023
UniHCP: A Unified Model for Human-Centric Perceptions
CVPR 2023
Open-Set Fine-Grained Retrieval via Prompting Vision-Language Evaluator
CVPR 2023
ACE: Cooperative Multi-Agent Q-learning with Bidirectional Action-Dependency
AAAI 2023
Multi-Scale Control Signal-Aware Transformer for Motion Synthesis without Phase
AAAI 2023
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
AAAI 2023
Fine-Grained Retrieval Prompt Tuning
AAAI 2023
Exploiting Visual Context Semantics for Sound Source Localization
WACV 2023
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition With Pre-Trained Vision-Language Models
CVPR 2023
Cycle-consistent Masked AutoEncoder for Unsupervised Domain Generalization
ICLR 2023
Better Teacher Better Student: Dynamic Prior Knowledge for Knowledge Distillation
ICLR 2023
SeCo: Separating Unknown Musical Visual Sounds With Consistency Guidance
WACV 2023
LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark
NIPS 2023
HumanBench: Towards General Human-Centric Perception With Projector Assisted Pretraining
CVPR 2023
Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object Detection
CVPR 2023
Towards Fair and Comprehensive Comparisons for Image-Based 3D Object Detection
ICCV 2023
What Can Simple Arithmetic Operations Do for Temporal Modeling?
ICCV 2023
Semi-Supervised Semantic Segmentation under Label Noise via Diverse Learning Groups
ICCV 2023
Pseudo-Labeled Auto-Curriculum Learning for Semi-Supervised Keypoint Localization
ICLR 2022
Stimulative Training of Residual Networks: A Social Psychology Perspective of Loafing
NIPS 2022
Unsupervised Object Detection Pretraining with Joint Object Priors Generation and Detector Learning
NIPS 2022
Category-Specific Nuance Exploration Network for Fine-Grained Object Retrieval
AAAI 2022
SepFusion: Finding Optimal Fusion Structures for Visual Sound Separation
AAAI 2022
Multi-Class Token Transformer for Weakly Supervised Semantic Segmentation
CVPR 2022
Accelerating Neural Network Optimization Through an Automated Control Theory Lens
CVPR 2022
Unsupervised Learning of Accurate Siamese Tracking
CVPR 2022
DR.VIC: Decomposition and Reasoning for Video Individual Counting
CVPR 2022
Not All Tokens Are Equal: Human-Centric Visual Analysis via Token Clustering Transformer
CVPR 2022
Revisiting the Transferability of Supervised Pretraining: An MLP Perspective
CVPR 2022
b-DARTS: Beta-Decay Regularization for Differentiable Architecture Search
CVPR 2022
3D Interacting Hand Pose Estimation by Hand De-Occlusion and Removal
ECCV 2022
Pose for Everything: Towards Category-Agnostic Pose Estimation
ECCV 2022
Backbone Is All Your Need: A Simplified Architecture for Visual Object Tracking
ECCV 2022
Fast-MoCo: Boost Momentum-Based Contrastive Learning with Combinatorial Patches
ECCV 2022
Unifying Visual Contrastive Learning for Object Recognition from a Graph Perspective
ECCV 2022
Relative Contrastive Loss for Unsupervised Representation Learning
ECCV 2022
Domain Invariant Masked Autoencoders for Self-Supervised Learning from Multi-Domains
ECCV 2022
NSNet: Non-Saliency Suppression Sampler for Efficient Video Recognition
ECCV 2022
MonoDistill: Learning Spatial Features for Monocular 3D Object Detection
ICLR 2022
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
ICLR 2022
RePre: Improving Self-Supervised Vision Transformer with Reconstructive Pre-training
IJCAI 2022
PyMAF: 3D Human Pose and Shape Regression With Pyramidal Mesh Alignment Feedback Loop
ICCV 2021
Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images
ICCV 2021
Evolving Search Space for Neural Architecture Search
ICCV 2021
GLiT: Neural Architecture Search for Global and Local Image Transformer
ICCV 2021
BN-NAS: Neural Architecture Search With Batch Normalization
ICCV 2021
Leveraging Auxiliary Tasks With Affinity Learning for Weakly Supervised Semantic Segmentation
ICCV 2021
Geometry Uncertainty Projection Network for Monocular 3D Object Detection
ICCV 2021
Aggregation With Feature Detection
ICCV 2021
A Continuous Mapping For Augmentation Design
NIPS 2021
Mutual CRF-GNN for Few-Shot Learning
CVPR 2021
Inception Convolution With Efficient Dilation Search
CVPR 2021
Layerwise Optimization by Gradient Decomposition for Continual Learning
CVPR 2021
Delving Into Localization Errors for Monocular 3D Object Detection
CVPR 2021
ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search
CVPR 2021
Gradient Regularized Contrastive Learning for Continual Domain Adaptation
AAAI 2021
Dynamic Position-aware Network for Fine-grained Image Recognition
AAAI 2021
AutoSampling: Search for Effective Data Sampling Schedules
ICML 2021
Once Quantization-Aware Training: High Performance Extremely Low-Bit Architecture Search
ICCV 2021
3D Hand Pose Estimation with Disentangled Cross-Modal Latent Space
WACV 2020
DASOT: A Unified Framework Integrating Data Association and Single Object Tracking for Online Multi-Object Tracking
AAAI 2020
Hierarchical Online Instance Matching for Person Search
AAAI 2020
Computation Reallocation for Object Detection
ICLR 2020
Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition
CVPR 2020
Improving Deep Video Compression by Resolution-adaptive Flow Coding
ECCV 2020
Content Adaptive and Error Propagation Aware Deep Video Compression
ECCV 2020
Differentiable Hierarchical Graph Grouping for Multi-Person Pose Estimation
ECCV 2020
Cheaper Pre-training Lunch: An Efficient Paradigm for Object Detection
ECCV 2020
Whole-Body Human Pose Estimation in the Wild
ECCV 2020
Rethinking Pseudo-LiDAR Representation
ECCV 2020
Equalization Loss for Long-Tailed Object Recognition
CVPR 2020
Relational Prototypical Network for Weakly Supervised Temporal Action Localization
AAAI 2020
Part-Level Graph Convolutional Network for Skeleton-Based Action Recognition
AAAI 2020
Multi-Dimensional Pruning: A Unified Framework for Model Compression
CVPR 2020
3D Human Mesh Regression With Dense Correspondence
CVPR 2020
EcoNAS: Finding Proxies for Economical Neural Architecture Search
CVPR 2020
Improving One-Shot NAS by Suppressing the Posterior Fading
CVPR 2020
Improving Auto-Augment via Augmentation-Wise Weight Sharing
NIPS 2020
Channel Pruning Guided by Classification Loss and Feature Importance
AAAI 2020
Libra R-CNN: Towards Balanced Learning for Object Detection
CVPR 2019
Improving Action Localization by Progressive Cross-Stream Cooperation
CVPR 2019
DVC: An End-To-End Deep Video Compression Framework
CVPR 2019
Multi-Person Articulated Tracking With Spatial and Temporal Embeddings
CVPR 2019
Hybrid Task Cascade for Instance Segmentation
CVPR 2019
Box-Driven Class-Wise Region Masking and Filling Rate Guided Loss for Weakly Supervised Semantic Segmentation
CVPR 2019
GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving
CVPR 2019
SR-LSTM: State Refinement for LSTM Towards Pedestrian Trajectory Prediction
CVPR 2019
Crowd Counting With Deep Structured Scale Integration Network
ICCV 2019
LAP-Net: Level-Aware Progressive Network for Image Dehazing
ICCV 2019
Structured Modeling of Joint Deep Feature and Prediction Refinement for Salient Object Detection
ICCV 2019
Unsupervised Collaborative Learning of Keyframe Detection and Visual Odometry Towards Monocular Deep SLAM
ICCV 2019
GradNet: Gradient-Guided Network for Visual Object Tracking
ICCV 2019
Online Hyper-Parameter Learning for Auto-Augmentation Strategy
ICCV 2019
Accurate Monocular 3D Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving
ICCV 2019
AM-LFS: AutoML for Loss Function Search
ICCV 2019
TRB: A Novel Triplet Representation for Understanding 2D Human Body
ICCV 2019
Feature Intertwiner for Object Detection
ICLR 2019
Quantization Mimic: Towards Very Tiny CNN for Object Detection
ECCV 2018
Person Search via A Mask-guided Two-stream CNN Model
ECCV 2018
FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction
NIPS 2018
Dividing and Aggregating Network for Multi-view Action Recognition
ECCV 2018
Factorizable Net: An Efficient Subgraph-based Framework for Scene Graph Generation
ECCV 2018
Visual Question Generation as Dual Task of Visual Question Answering
CVPR 2018
3D Human Pose Estimation in the Wild by Adversarial Learning
CVPR 2018
Exploit the Unknown Gradually: One-Shot Video-Based Person Re-Identification by Stepwise Learning
CVPR 2018
Collaborative and Adversarial Network for Unsupervised Domain Adaptation
CVPR 2018
Crowd Counting using Deep Recurrent Spatial-Aware Network
IJCAI 2018
Mask-Guided Contrastive Attention Model for Person Re-Identification
CVPR 2018
PAD-Net: Multi-Tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing
CVPR 2018
Style Aggregated Network for Facial Landmark Detection
CVPR 2018
Attention-Aware Compositional Network for Person Re-Identification
CVPR 2018
Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition
CVPR 2018
Deep Kalman Filtering Network for Video Compression Artifact Reduction
ECCV 2018
Multi-Context Attention for Human Pose Estimation
CVPR 2017
Learning Feature Pyramids for Human Pose Estimation
ICCV 2017
Scene Graph Generation From Objects, Phrases and Region Captions
ICCV 2017
Quality Aware Network for Set to Set Recognition
CVPR 2017
Learning Spatial Regularization With Image-Level Supervisions for Multi-Label Image Classification
CVPR 2017
Learning Cross-Modal Deep Representations for Robust Pedestrian Detection
CVPR 2017
Multi-Scale Continuous CRFs as Sequential Deep Networks for Monocular Depth Estimation
CVPR 2017
Chained Cascade Network for Object Detection
ICCV 2017
ViP-CNN: Visual Phrase Guided Convolutional Neural Network
CVPR 2017
Object Detection in Videos With Tubelet Proposal Networks
CVPR 2017
Learning Deep Structured Multi-Scale Features using Attention-Gated CRFs for Contour Prediction
NIPS 2017
Online Multi-Object Tracking Using CNN-Based Single Object Tracker With Spatial-Temporal Attention Mechanism
ICCV 2017
STCT: Sequentially Training Convolutional Networks for Visual Tracking
CVPR 2016
End-To-End Learning of Deformable Mixture of Parts and Deep Convolutional Neural Networks for Human Pose Estimation
CVPR 2016
Multi-Bias Non-linear Activation in Deep Neural Networks
ICML 2016
CRF-CNN: Modeling Structured Information in Human Pose Estimation
NIPS 2016
Structured Feature Learning for Pose Estimation
CVPR 2016
Object Detection From Video Tubelets With Convolutional Neural Networks
CVPR 2016
Factors in Finetuning Deep Model for Object Detection With Long-Tail Distribution
CVPR 2016
Learning Deep Feature Representations With Domain Guided Dropout for Person Re-Identification
CVPR 2016
Learning Deep Representation With Large-Scale Attributes
ICCV 2015
Saliency Detection by Multi-Context Deep Learning
CVPR 2015
DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection
CVPR 2015
Visual Tracking With Fully Convolutional Networks
ICCV 2015
Multi-Task Recurrent Neural Network for Immediacy Prediction
ICCV 2015
Multi-source Deep Learning for Human Pose Estimation
CVPR 2014
Learning Mid-level Filters for Person Re-identification
CVPR 2014
Multi-stage Contextual Deep Learning for Pedestrian Detection
ICCV 2013
Person Re-identification by Salience Matching
ICCV 2013
Joint Deep Learning for Pedestrian Detection
ICCV 2013
Modeling Mutual Visibility Relationship in Pedestrian Detection
CVPR 2013
Single-Pedestrian Detection Aided by Multi-pedestrian Detection
CVPR 2013
Unsupervised Salience Learning for Person Re-identification
CVPR 2013