Ping Luo
237 papers · 2013–2026 · 15 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+19 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (21) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (6) π Conference Polyglot (15)
π
Cross-Pollinator
(13)
π
Renaissance Researcher
(6)
π
Interdisciplinary Bridge
π
Conference Loyalist
(30)
π
Keyword Trendsetter Combo
(4)
π€
Dynamic Duo
(41)
π
Triple Crown
π
Keyword Champion
π
Grand Slam
π₯
Mega-Team
(22)
π¬
Deep Specialist
(30)
π§¬
Topic Evolution
π₯
Unstoppable
(13)
β
The Questioner
(2)
π
Conference Pioneer
π
Century Club
(234)
β‘
Prolific Year
(34)
ποΈ
Keyword Collector
(51)
π
Trend Setter
Conferences
CVPR (62)
ICCV (37)
NIPS (30)
ECCV (23)
ICLR (23)
ICML (18)
AAAI (14)
ACL (13)
IJCAI (8)
COLING (2)
EMNLP (2)
RSS (2)
CORL (1)
NAACL (1)
WACV (1)
Top co-authors
Keywords
object detection
(21)
convolutional neural network
(19)
semantic segmentation
(17)
large language model
(17)
image generation
(11)
vision-language model
(10)
transfer learning
(10)
knowledge distillation
(9)
model compression
(9)
diffusion model
(8)
multi-modal learning
(8)
contrastive learning
(7)
autonomous driving
(7)
representation learning
(7)
vision transformer
(7)
multimodal learning
(7)
deep learning
(7)
self-supervised learning
(6)
foundation model
(6)
instance segmentation
(6)
Papers
Laytrol: Preserving Pretrained Knowledge in Layout Control for Multimodal Diffusion Transformers
AAAI 2026
Beyond Query Memorization: Large Language Model Routing with Query Decomposition and Historical Matching
ACL 2026
FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation
AAAI 2026
RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins
CVPR 2025
AnalogCoder: Analog Circuit Design via Training-Free Code Generation
AAAI 2025
End-to-End Autonomous Driving Through V2X Cooperation
AAAI 2025
AutoMMLab: Automatically Generating Deployable Models from Language Instructions for Computer Vision Tasks
AAAI 2025
Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM
ICCV 2025
GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
ICCV 2025
MangaNinja: Line Art Colorization with Precise Reference Following
CVPR 2025
Unsupervised Continual Domain Shift Learning with Multi-Prototype Modeling
CVPR 2025
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
CVPR 2025
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots
NAACL 2025
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
ACL 2025
HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model
ACL 2025
Attention with Dependency Parsing Augmentation for Fine-Grained Attribution
ACL 2025
Whether LLMs Know If They Know: Identifying Knowledge Boundaries via Debiased Historical In-Context Learning
ACL 2025
Text2World: Benchmarking Large Language Models for Symbolic World Model Generation
ACL 2025
Learning to Act Anywhere with Task-centric Latent Actions
RSS 2025
DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation
CVPR 2025
JiSAM: Alleviate Labeling Burden and Corner Case Problems in Autonomous Driving via Minimal Real-World Data
CVPR 2025
LiT: Delving into a Simple Linear Diffusion Transformer for Image Generation
ICCV 2025
SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement
ICLR 2025
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
ICLR 2025
IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
ICLR 2025
Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping
ICLR 2025
NADER: Neural Architecture Design via Multi-Agent Collaboration
CVPR 2025
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation
ICML 2025
BOOD: Boundary-based Out-Of-Distribution Data Generation
ICML 2025
G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation
CVPR 2025
CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians
CVPR 2025
Goku: Flow Based Video Generative Foundation Models
CVPR 2025
Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models
CVPR 2025
AttnComp: Attention-Guided Adaptive Context Compression for Retrieval-Augmented Generation
EMNLP 2025
Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation
NIPS 2024
Needle In A Multimodal Haystack
NIPS 2024
MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts
NIPS 2024
SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up-to-Date Internet Knowledge
NIPS 2024
GKGNet: Group K-Nearest Neighbor based Graph Convolutional Network for Multi-Label Image Recognition
ECCV 2024
You Only Learn One Query: Learning Unified Human Query for Single-Stage Multi-Person Multi-Task Human-Centric Perception
ECCV 2024
UniFS: Universal Few-shot Instance Perception with Point Representations
ECCV 2024
PixArt-Sigma: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
ECCV 2024
When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset
ECCV 2024
DriveLM: Driving with Graph Visual Question Answering
ECCV 2024
"Segment, Lift and Fit: Automatic 3D Shape Labeling from 2D Prompts"
ECCV 2024
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
NIPS 2024
Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability, Reproducibility, and Practicality
NIPS 2024
ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Ablation Capability for Large Vision-Language Models
NIPS 2024
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies
NIPS 2024
Scalable and Effective Arithmetic Tree Generation for Adder and Multiplier Designs
NIPS 2024
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
ICML 2024
Position: Towards Implicit Prompt For Text-To-Image Models
ICML 2024
Mind the Boundary: Coreset Selection via Reconstructing the Decision Boundary
ICML 2024
Diagnosing the Compositional Knowledge of Vision Language Models from a Game-Theoretic View
ICML 2024
DeepAccident: A Motion and Accident Prediction Benchmark for V2X Autonomous Driving
AAAI 2024
Cached Transformers: Improving Transformers with Differentiable Memory Cachde
AAAI 2024
LLaMA Pro: Progressive LLaMA with Block Expansion
ACL 2024
Uncovering Limitations of Large Language Models in Information Seeking from Tables
ACL 2024
URG: A Unified Ranking and Generation Method for Ensembling Language Models
ACL 2024
ChartAssistant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning
ACL 2024
KET-QA: A Dataset for Knowledge Enhanced Table Question Answering
COLING 2024
TAeKD: Teacher Assistant Enhanced Knowledge Distillation for Closed-Source Multilingual Neural Machine Translation
COLING 2024
RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis
ICML 2024
VDT: General-purpose Video Diffusion Transformers via Mask Modeling
ICLR 2024
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models
ICLR 2024
PixArt-$\alpha$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
ICLR 2024
BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation
ICLR 2024
Tree-Planner: Efficient Close-loop Task Planning with Large Language Models
ICLR 2024
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
ICLR 2024
PROGRAM: PROtotype GRAph Model based Pseudo-Label Learning for Test-Time Adaptation
ICLR 2024
Large Language Models as Automated Aligners for benchmarking Vision-Language Models
ICLR 2024
Learning Manipulation by Predicting Interaction
RSS 2024
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
CVPR 2024
DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model
CVPR 2024
Generalized Predictive Model for Autonomous Driving
CVPR 2024
RegionGPT: Towards Region Understanding Vision Language Model
CVPR 2024
SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution
CVPR 2024
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
CVPR 2024
OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM
CVPR 2024
GenTron: Diffusion Transformers for Image and Video Generation
CVPR 2024
Exploring Transformers for Open-world Instance Segmentation
ICCV 2023
DDP: Diffusion Model for Dense Visual Prediction
ICCV 2023
RIGID: Recurrent GAN Inversion and Editing of Real Face Videos
ICCV 2023
DrugOOD: Out-of-Distribution Dataset Curator and Benchmark for AI-Aided Drug Discovery β a Focus on Affinity Prediction Problems with Noise Annotations
AAAI 2023
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
NIPS 2023
Flow-Based Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection
NIPS 2023
RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
NIPS 2023
OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping
NIPS 2023
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought
NIPS 2023
Going Denser with Open-Vocabulary Part Segmentation
ICCV 2023
Accelerating Vision-Language Pretraining With Free Language Modeling
CVPR 2023
Visual Dependency Transformers: Dependency Tree Emerges From Reversed Attention
CVPR 2023
Universal Instance Perception As Object Discovery and Retrieval
CVPR 2023
RIFormer: Keep Your Vision Backbone Effective but Removing Token Mixer
CVPR 2023
V2X-Seq: A Large-Scale Sequential Dataset for Vehicle-Infrastructure Cooperative Perception and Forecasting
CVPR 2023
Learning Transferable Spatiotemporal Representations From Natural Script Knowledge
CVPR 2023
EC2: Emergent Communication for Embodied Control
CVPR 2023
Real-Time Controllable Denoising for Image and Video
CVPR 2023
Policy Adaptation From Foundation Model Feedback
CVPR 2023
Dense Distinct Query for End-to-End Object Detection
CVPR 2023
Foundation Model is Efficient Multimodal Multitask Model Selector
NIPS 2023
ChiPFormer: Transferable Chip Placement via Offline Decision Transformer
ICML 2023
AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
ICML 2023
Guideline Learning for In-Context Information Extraction
EMNLP 2023
EGC: Image Generation and Classification via a Diffusion Energy-Based Model
ICCV 2023
MetaBEV: Solving Sensor Failures for 3D Detection and Map Segmentation
ICCV 2023
Scene as Occupancy
ICCV 2023
DiffRate : Differentiable Compression Rate for Efficient Vision Transformers
ICCV 2023
Segment Every Reference Object in Spatial and Temporal Spaces
ICCV 2023
Beyond One-to-One: Rethinking the Referring Image Segmentation
ICCV 2023
Top-Ambiguity Samples Matter: Understanding Why Deep Ensemble Works in Selective Classification
NIPS 2023
Learning Object-Language Alignments for Open-Vocabulary Object Detection
ICLR 2023
CO3: Cooperative Unsupervised 3D Representation Learning for Autonomous Driving
ICLR 2023
Soft Neighbors are Positive Supporters in Contrastive Visual Representation Learning
ICLR 2023
$\pi$-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation
ICML 2023
Structured Pruning for Efficient Generative Pre-trained Language Models
ACL 2023
DSP: Discriminative Soft Prompts for Zero-Shot Entity and Relation Extraction
ACL 2023
DiffusionDet: Diffusion Model for Object Detection
ICCV 2023
DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion
CVPR 2022
AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition
NIPS 2022
Large-batch Optimization for Dense Visual Predictions: Training Faster R-CNN in 4.2 Minutes
NIPS 2022
MaskPlace: Fast Chip Placement via Reinforced Visual Representation Learning
NIPS 2022
DOMINO: Decomposed Mutual Information Optimization for Generalized Context in Meta-Reinforcement Learning
NIPS 2022
AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation
NIPS 2022
Rethinking Resolution in the Context of Efficient Video Recognition
NIPS 2022
Embodied Concept Learner: Self-supervised Learning of Concepts and Mapping through Instruction Following
CORL 2022
Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization
AAAI 2022
Compression of Generative Pre-trained Language Models via Quantization
ACL 2022
Bridging Video-Text Retrieval With Multiple Choice Questions
CVPR 2022
RestoreFormer: High-Quality Blind Face Restoration From Undegraded Key-Value Pairs
CVPR 2022
Language As Queries for Referring Video Object Segmentation
CVPR 2022
Not All Tokens Are Equal: Human-Centric Visual Analysis via Token Clustering Transformer
CVPR 2022
Panoptic SegFormer: Delving Deeper Into Panoptic Segmentation With Transformers
CVPR 2022
Scale-Equivalent Distillation for Semi-Supervised Object Detection
CVPR 2022
PoseTrans: A Simple yet Effective Pose Transformation Augmentation for Human Pose Estimation
ECCV 2022
3D Interacting Hand Pose Estimation by Hand De-Occlusion and Removal
ECCV 2022
Pose for Everything: Towards Category-Agnostic Pose Estimation
ECCV 2022
Towards Grand Unification of Object Tracking
ECCV 2022
ByteTrack: Multi-Object Tracking by Associating Every Detection Box
ECCV 2022
DaViT: Dual Attention Vision Transformers
ECCV 2022
Not All Models Are Equal: Predicting Model Transferability in a Self-Challenging Fisher Space
ECCV 2022
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-Text Retrieval
ECCV 2022
Objects in Semantic Topology
ICLR 2022
Dynamic Token Normalization improves Vision Transformers
ICLR 2022
Learning Versatile Neural Architectures by Propagating Network Codes
ICLR 2022
Pseudo-Labeled Auto-Curriculum Learning for Semi-Supervised Keypoint Localization
ICLR 2022
CycleMLP: A MLP-like Architecture for Dense Prediction
ICLR 2022
Flow-based Recurrent Belief State Learning for POMDPs
ICML 2022
CtrlFormer: Learning Transferable State Representation for Visual Control via Transformer
ICML 2022
VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix
ICML 2022
Donβt Touch What Matters: Task-Aware Lipschitz Data Augmentation for Visual Reinforcement Learning
IJCAI 2022
Compensation Tracker: Reprocessing Lost Object for Multi-Object Tracking
WACV 2022
Disentangled Cycle Consistency for Highly-Realistic Virtual Try-On
CVPR 2021
When Human Pose Estimation Meets Robustness: Adversarial Algorithms and Benchmarks
CVPR 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction Without Convolutions
ICCV 2021
Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution
ICML 2021
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers
NIPS 2021
Do 2D GANs Know 3D Shape? Unsupervised 3D Shape Reconstruction from 2D Image GANs
ICLR 2021
Compressed Video Contrastive Learning
NIPS 2021
Segmenting Transparent Objects in the Wild with Transformer
IJCAI 2021
Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language
NIPS 2021
DetCo: Unsupervised Contrastive Learning for Object Detection
ICCV 2021
Adversarial Robustness for Unsupervised Domain Adaptation
ICCV 2021
Watch Only Once: An End-to-End Video Action Detection Framework
ICCV 2021
Bringing Events Into Video Deblurring With Non-Consecutively Blurry Frames
ICCV 2021
STAR: A Structure-Aware Lightweight Transformer for Real-Time Image Enhancement
ICCV 2021
End-to-End Dense Video Captioning With Parallel Decoding
ICCV 2021
Model-Based Reinforcement Learning via Imagination with Derived Memory
NIPS 2021
What Makes for End-to-End Object Detection?
ICML 2021
Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning
NIPS 2021
Extracting Zero-shot Structured Information from Form-like Documents: Pretraining with Keys and Triggers
AAAI 2021
A Unified Multi-Scenario Attacking Network for Visual Object Tracking
AAAI 2021
A Bottom-Up DAG Structure Extraction Model for Math Word Problems
AAAI 2021
Rethinking the Pruning Criteria for Convolutional Neural Network
NIPS 2021
HR-NAS: Searching Efficient High-Resolution Neural Architectures With Lightweight Transformers
CVPR 2021
ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search
CVPR 2021
Parser-Free Virtual Try-On via Distilling Appearance Flows
CVPR 2021
Sparse R-CNN: End-to-End Object Detection With Learnable Proposals
CVPR 2021
3D Human Mesh Regression With Dense Correspondence
CVPR 2020
Segmenting Transparent Objects in the Wild
ECCV 2020
Whole-Body Human Pose Estimation in the Wild
ECCV 2020
Webly Supervised Image Classification with Self-Contained Confidence
ECCV 2020
Differentiable Hierarchical Graph Grouping for Multi-Person Pose Estimation
ECCV 2020
Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation
ECCV 2020
AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting
ECCV 2020
Dynamic and Static Context-aware LSTM for Multi-agent Motion Prediction
ECCV 2020
PolarMask: Single Shot Instance Segmentation With Polar Representation
CVPR 2020
Exemplar Normalization for Learning Deep Representation
CVPR 2020
Online Knowledge Distillation via Collaborative Learning
CVPR 2020
Learning Depth-Guided Convolutions for Monocular 3D Object Detection
CVPR 2020
Towards Photo-Realistic Virtual Try-On by Adaptively Generating-Preserving Image Content
CVPR 2020
MaskGAN: Towards Diverse and Interactive Facial Image Manipulation
CVPR 2020
Learning a Reinforced Agent for Flexible Exposure Bracketing Selection
CVPR 2020
Every Frame Counts: Joint Learning of Video Segmentation and Optical Flow
AAAI 2020
Channel Equilibrium Networks for Learning Deep Representation
ICML 2020
Differentiable Dynamic Normalization for Learning Deep Representation
ICML 2019
DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images
CVPR 2019
SSN: Learning Sparse Switchable Normalization via SparsestMax
CVPR 2019
Switchable Whitening for Deep Representation Learning
ICCV 2019
Deep Self-Learning From Noisy Labels
ICCV 2019
Vision-Infused Deep Audio Inpainting
ICCV 2019
Differentiable Learning-to-Normalize via Switchable Normalization
ICLR 2019
Talking Face Generation by Adversarially Disentangled Audio-Visual Representation
AAAI 2019
CamNet: Coarse-to-Fine Retrieval for Camera Re-Localization
ICCV 2019
Differentiable Learning-to-Group Channels via Groupable Convolutional Neural Networks
ICCV 2019
Fashion Retrieval via Graph Reasoning Networks on a Similarity Pyramid
ICCV 2019
Towards Understanding Regularization in Batch Normalization
ICLR 2019
Once a MAN: Towards Multi-Target Attack via Learning Multi-Target Adversarial Network Once
ICCV 2019
Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net
ECCV 2018
Adaboost with Auto-Evaluation for Conversational Models
IJCAI 2018
FaceID-GAN: Learning a Symmetry Three-Player GAN for Identity-Preserving Face Synthesis
CVPR 2018
Kalman Normalization: Normalizing Internal Representations Across Network Layers
NIPS 2018
Not All Pixels Are Equal: Difficulty-Aware Semantic Segmentation via Deep Layer Cascade
CVPR 2017
Learning Object Interactions and Descriptions for Semantic Image Segmentation
CVPR 2017
EigenNet: Towards Fast and Structural Learning of Deep Neural Networks
IJCAI 2017
Learning Deep Architectures via Generalized Whitened Neural Networks
ICML 2017
Deep Dual Learning for Semantic Image Segmentation
ICCV 2017
Browsing Regularities in Hedonic Content Systems
IJCAI 2016
WIDER FACE: A Face Detection Benchmark
CVPR 2016
DeepFashion: Powering Robust Clothes Recognition and Retrieval With Rich Annotations
CVPR 2016
Deep Learning Face Attributes in the Wild
ICCV 2015
DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection
CVPR 2015
Pedestrian Detection Aided by Deep Learning Semantic Tasks
CVPR 2015
Matrix Factorization with Scale-Invariant Parameters
IJCAI 2015
Supervised Representation Learning: Transfer Learning with Deep Autoencoders
IJCAI 2015
Semantic Image Segmentation via Deep Parsing Network
ICCV 2015
Deep Learning Strong Parts for Pedestrian Detection
ICCV 2015
Learning Social Relation Traits From Face Images
ICCV 2015
From Facial Parts Responses to Face Detection: A Deep Learning Approach
ICCV 2015
A Large-Scale Car Dataset for Fine-Grained Categorization and Verification
CVPR 2015
Switchable Deep Network for Pedestrian Detection
CVPR 2014
Multi-View Perceptron: a Deep Model for Learning Face Identity and View Representations
NIPS 2014
Clothing Co-Parsing by Joint Image Segmentation and Labeling
CVPR 2014
Deep Learning Identity-Preserving Face Space
ICCV 2013
Pedestrian Parsing via Deep Decompositional Network
ICCV 2013
A Deep Sum-Product Architecture for Robust Facial Attributes Analysis
ICCV 2013
Concept Learning for Cross-Domain Text Classification: A General Probabilistic Framework
IJCAI 2013