conftrace_

Ping Luo

237 papers · 2013–2026 · 15 conferences · across top CS/AI conferences

Achievements

Jump to papers ↓

+19 more ↓

🗺️ Taxonomy Completionist (21) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (6) 🌍 Conference Polyglot (15)

🐝 Cross-Pollinator (13) 🌈 Renaissance Researcher (6) 🌉 Interdisciplinary Bridge 🏠 Conference Loyalist (30) 🌟 Keyword Trendsetter Combo (4) 🤝 Dynamic Duo (41) 👑 Triple Crown 🏆 Keyword Champion 🏆 Grand Slam 👥 Mega-Team (22) 🔬 Deep Specialist (30) 🧬 Topic Evolution 🔥 Unstoppable (13) ❓ The Questioner (2) 🚀 Conference Pioneer 💎 Century Club (234) ⚡ Prolific Year (34) 🗃️ Keyword Collector (51) 📈 Trend Setter

Conferences

CVPR (62) ICCV (37) NIPS (30) ECCV (23) ICLR (23) ICML (18) AAAI (14) ACL (13) IJCAI (8) COLING (2) EMNLP (2) RSS (2) CORL (1) NAACL (1) WACV (1)

Top co-authors

Wenqi Shao (41) Xiaogang Wang (31) Yu Qiao (25) Mingyu Ding (23) Enze Xie (22) Xiaoou Tang (20) Peize Sun (19) Kaipeng Zhang (19) Wentao Liu (18) Chongjian GE (18)

Keywords

object detection (21) convolutional neural network (19) semantic segmentation (17) large language model (17) image generation (11) vision-language model (10) transfer learning (10) knowledge distillation (9) model compression (9) diffusion model (8) multi-modal learning (8) contrastive learning (7) autonomous driving (7) representation learning (7) vision transformer (7) multimodal learning (7) deep learning (7) self-supervised learning (6) foundation model (6) instance segmentation (6)

Papers

Laytrol: Preserving Pretrained Knowledge in Layout Control for Multimodal Diffusion Transformers AAAI 2026 Beyond Query Memorization: Large Language Model Routing with Query Decomposition and Historical Matching ACL 2026 FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation AAAI 2026 RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins CVPR 2025 AnalogCoder: Analog Circuit Design via Training-Free Code Generation AAAI 2025 End-to-End Autonomous Driving Through V2X Cooperation AAAI 2025 AutoMMLab: Automatically Generating Deployable Models from Language Instructions for Computer Vision Tasks AAAI 2025 Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM ICCV 2025 GUIOdyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices ICCV 2025 MangaNinja: Line Art Colorization with Precise Reference Following CVPR 2025 Unsupervised Continual Domain Shift Learning with Multi-Prototype Modeling CVPR 2025 Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation CVPR 2025 Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots NAACL 2025 EfficientQAT: Efficient Quantization-Aware Training for Large Language Models ACL 2025 HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model ACL 2025 Attention with Dependency Parsing Augmentation for Fine-Grained Attribution ACL 2025 Whether LLMs Know If They Know: Identifying Knowledge Boundaries via Debiased Historical In-Context Learning ACL 2025 Text2World: Benchmarking Large Language Models for Symbolic World Model Generation ACL 2025 Learning to Act Anywhere with Task-centric Latent Actions RSS 2025 DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation CVPR 2025 JiSAM: Alleviate Labeling Burden and Corner Case Problems in Autonomous Driving via Minimal Real-World Data CVPR 2025 LiT: Delving into a Simple Linear Diffusion Transformer for Image Generation ICCV 2025 SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement ICLR 2025 MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models ICLR 2025 IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model ICLR 2025 Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping ICLR 2025 NADER: Neural Architecture Design via Multi-Agent Collaboration CVPR 2025 Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation ICML 2025 BOOD: Boundary-based Out-Of-Distribution Data Generation ICML 2025 G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation CVPR 2025 CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians CVPR 2025 Goku: Flow Based Video Generative Foundation Models CVPR 2025 Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models CVPR 2025 AttnComp: Attention-Guided Adaptive Context Compression for Retrieval-Augmented Generation EMNLP 2025 Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation NIPS 2024 Needle In A Multimodal Haystack NIPS 2024 MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts NIPS 2024 SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up-to-Date Internet Knowledge NIPS 2024 GKGNet: Group K-Nearest Neighbor based Graph Convolutional Network for Multi-Label Image Recognition ECCV 2024 You Only Learn One Query: Learning Unified Human Query for Single-Stage Multi-Person Multi-Task Human-Centric Perception ECCV 2024 UniFS: Universal Few-shot Instance Perception with Point Representations ECCV 2024 PixArt-Sigma: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation ECCV 2024 When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset ECCV 2024 DriveLM: Driving with Graph Visual Question Answering ECCV 2024 "Segment, Lift and Fit: Automatic 3D Shape Labeling from 2D Prompts" ECCV 2024 VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks NIPS 2024 Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability, Reproducibility, and Practicality NIPS 2024 ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Ablation Capability for Large Vision-Language Models NIPS 2024 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies NIPS 2024 Scalable and Effective Arithmetic Tree Generation for Adder and Multiplier Designs NIPS 2024 MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI ICML 2024 Position: Towards Implicit Prompt For Text-To-Image Models ICML 2024 Mind the Boundary: Coreset Selection via Reconstructing the Decision Boundary ICML 2024 Diagnosing the Compositional Knowledge of Vision Language Models from a Game-Theoretic View ICML 2024 DeepAccident: A Motion and Accident Prediction Benchmark for V2X Autonomous Driving AAAI 2024 Cached Transformers: Improving Transformers with Differentiable Memory Cachde AAAI 2024 LLaMA Pro: Progressive LLaMA with Block Expansion ACL 2024 Uncovering Limitations of Large Language Models in Information Seeking from Tables ACL 2024 URG: A Unified Ranking and Generation Method for Ensembling Language Models ACL 2024 ChartAssistant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning ACL 2024 KET-QA: A Dataset for Knowledge Enhanced Table Question Answering COLING 2024 TAeKD: Teacher Assistant Enhanced Knowledge Distillation for Closed-Source Multilingual Neural Machine Translation COLING 2024 RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis ICML 2024 VDT: General-purpose Video Diffusion Transformers via Mask Modeling ICLR 2024 OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models ICLR 2024 PixArt-$\alpha$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis ICLR 2024 BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation ICLR 2024 Tree-Planner: Efficient Close-loop Task Planning with Large Language Models ICLR 2024 InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation ICLR 2024 PROGRAM: PROtotype GRAph Model based Pseudo-Label Learning for Test-Time Adaptation ICLR 2024 Large Language Models as Automated Aligners for benchmarking Vision-Language Models ICLR 2024 Learning Manipulation by Predicting Interaction RSS 2024 MVBench: A Comprehensive Multi-modal Video Understanding Benchmark CVPR 2024 DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model CVPR 2024 Generalized Predictive Model for Autonomous Driving CVPR 2024 RegionGPT: Towards Region Understanding Vision Language Model CVPR 2024 SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution CVPR 2024 InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks CVPR 2024 OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM CVPR 2024 GenTron: Diffusion Transformers for Image and Video Generation CVPR 2024 Exploring Transformers for Open-world Instance Segmentation ICCV 2023 DDP: Diffusion Model for Dense Visual Prediction ICCV 2023 RIGID: Recurrent GAN Inversion and Editing of Real Face Videos ICCV 2023 DrugOOD: Out-of-Distribution Dataset Curator and Benchmark for AI-Aided Drug Discovery – a Focus on Affinity Prediction Problems with Noise Annotations AAAI 2023 VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks NIPS 2023 Flow-Based Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection NIPS 2023 RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths NIPS 2023 OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping NIPS 2023 EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought NIPS 2023 Going Denser with Open-Vocabulary Part Segmentation ICCV 2023 Accelerating Vision-Language Pretraining With Free Language Modeling CVPR 2023 Visual Dependency Transformers: Dependency Tree Emerges From Reversed Attention CVPR 2023 Universal Instance Perception As Object Discovery and Retrieval CVPR 2023 RIFormer: Keep Your Vision Backbone Effective but Removing Token Mixer CVPR 2023 V2X-Seq: A Large-Scale Sequential Dataset for Vehicle-Infrastructure Cooperative Perception and Forecasting CVPR 2023 Learning Transferable Spatiotemporal Representations From Natural Script Knowledge CVPR 2023 EC2: Emergent Communication for Embodied Control CVPR 2023 Real-Time Controllable Denoising for Image and Video CVPR 2023 Policy Adaptation From Foundation Model Feedback CVPR 2023 Dense Distinct Query for End-to-End Object Detection CVPR 2023 Foundation Model is Efficient Multimodal Multitask Model Selector NIPS 2023 ChiPFormer: Transferable Chip Placement via Offline Decision Transformer ICML 2023 AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners ICML 2023 Guideline Learning for In-Context Information Extraction EMNLP 2023 EGC: Image Generation and Classification via a Diffusion Energy-Based Model ICCV 2023 MetaBEV: Solving Sensor Failures for 3D Detection and Map Segmentation ICCV 2023 Scene as Occupancy ICCV 2023 DiffRate : Differentiable Compression Rate for Efficient Vision Transformers ICCV 2023 Segment Every Reference Object in Spatial and Temporal Spaces ICCV 2023 Beyond One-to-One: Rethinking the Referring Image Segmentation ICCV 2023 Top-Ambiguity Samples Matter: Understanding Why Deep Ensemble Works in Selective Classification NIPS 2023 Learning Object-Language Alignments for Open-Vocabulary Object Detection ICLR 2023 CO3: Cooperative Unsupervised 3D Representation Learning for Autonomous Driving ICLR 2023 Soft Neighbors are Positive Supporters in Contrastive Visual Representation Learning ICLR 2023 $\pi$-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation ICML 2023 Structured Pruning for Efficient Generative Pre-trained Language Models ACL 2023 DSP: Discriminative Soft Prompts for Zero-Shot Entity and Relation Extraction ACL 2023 DiffusionDet: Diffusion Model for Object Detection ICCV 2023 DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion CVPR 2022 AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition NIPS 2022 Large-batch Optimization for Dense Visual Predictions: Training Faster R-CNN in 4.2 Minutes NIPS 2022 MaskPlace: Fast Chip Placement via Reinforced Visual Representation Learning NIPS 2022 DOMINO: Decomposed Mutual Information Optimization for Generalized Context in Meta-Reinforcement Learning NIPS 2022 AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation NIPS 2022 Rethinking Resolution in the Context of Efficient Video Recognition NIPS 2022 Embodied Concept Learner: Self-supervised Learning of Concepts and Mapping through Instruction Following CORL 2022 Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization AAAI 2022 Compression of Generative Pre-trained Language Models via Quantization ACL 2022 Bridging Video-Text Retrieval With Multiple Choice Questions CVPR 2022 RestoreFormer: High-Quality Blind Face Restoration From Undegraded Key-Value Pairs CVPR 2022 Language As Queries for Referring Video Object Segmentation CVPR 2022 Not All Tokens Are Equal: Human-Centric Visual Analysis via Token Clustering Transformer CVPR 2022 Panoptic SegFormer: Delving Deeper Into Panoptic Segmentation With Transformers CVPR 2022 Scale-Equivalent Distillation for Semi-Supervised Object Detection CVPR 2022 PoseTrans: A Simple yet Effective Pose Transformation Augmentation for Human Pose Estimation ECCV 2022 3D Interacting Hand Pose Estimation by Hand De-Occlusion and Removal ECCV 2022 Pose for Everything: Towards Category-Agnostic Pose Estimation ECCV 2022 Towards Grand Unification of Object Tracking ECCV 2022 ByteTrack: Multi-Object Tracking by Associating Every Detection Box ECCV 2022 DaViT: Dual Attention Vision Transformers ECCV 2022 Not All Models Are Equal: Predicting Model Transferability in a Self-Challenging Fisher Space ECCV 2022 MILES: Visual BERT Pre-training with Injected Language Semantics for Video-Text Retrieval ECCV 2022 Objects in Semantic Topology ICLR 2022 Dynamic Token Normalization improves Vision Transformers ICLR 2022 Learning Versatile Neural Architectures by Propagating Network Codes ICLR 2022 Pseudo-Labeled Auto-Curriculum Learning for Semi-Supervised Keypoint Localization ICLR 2022 CycleMLP: A MLP-like Architecture for Dense Prediction ICLR 2022 Flow-based Recurrent Belief State Learning for POMDPs ICML 2022 CtrlFormer: Learning Transferable State Representation for Visual Control via Transformer ICML 2022 VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix ICML 2022 Don’t Touch What Matters: Task-Aware Lipschitz Data Augmentation for Visual Reinforcement Learning IJCAI 2022 Compensation Tracker: Reprocessing Lost Object for Multi-Object Tracking WACV 2022 Disentangled Cycle Consistency for Highly-Realistic Virtual Try-On CVPR 2021 When Human Pose Estimation Meets Robustness: Adversarial Algorithms and Benchmarks CVPR 2021 Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction Without Convolutions ICCV 2021 Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution ICML 2021 SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers NIPS 2021 Do 2D GANs Know 3D Shape? Unsupervised 3D Shape Reconstruction from 2D Image GANs ICLR 2021 Compressed Video Contrastive Learning NIPS 2021 Segmenting Transparent Objects in the Wild with Transformer IJCAI 2021 Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language NIPS 2021 DetCo: Unsupervised Contrastive Learning for Object Detection ICCV 2021 Adversarial Robustness for Unsupervised Domain Adaptation ICCV 2021 Watch Only Once: An End-to-End Video Action Detection Framework ICCV 2021 Bringing Events Into Video Deblurring With Non-Consecutively Blurry Frames ICCV 2021 STAR: A Structure-Aware Lightweight Transformer for Real-Time Image Enhancement ICCV 2021 End-to-End Dense Video Captioning With Parallel Decoding ICCV 2021 Model-Based Reinforcement Learning via Imagination with Derived Memory NIPS 2021 What Makes for End-to-End Object Detection? ICML 2021 Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning NIPS 2021 Extracting Zero-shot Structured Information from Form-like Documents: Pretraining with Keys and Triggers AAAI 2021 A Unified Multi-Scenario Attacking Network for Visual Object Tracking AAAI 2021 A Bottom-Up DAG Structure Extraction Model for Math Word Problems AAAI 2021 Rethinking the Pruning Criteria for Convolutional Neural Network NIPS 2021 HR-NAS: Searching Efficient High-Resolution Neural Architectures With Lightweight Transformers CVPR 2021 ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search CVPR 2021 Parser-Free Virtual Try-On via Distilling Appearance Flows CVPR 2021 Sparse R-CNN: End-to-End Object Detection With Learnable Proposals CVPR 2021 3D Human Mesh Regression With Dense Correspondence CVPR 2020 Segmenting Transparent Objects in the Wild ECCV 2020 Whole-Body Human Pose Estimation in the Wild ECCV 2020 Webly Supervised Image Classification with Self-Contained Confidence ECCV 2020 Differentiable Hierarchical Graph Grouping for Multi-Person Pose Estimation ECCV 2020 Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation ECCV 2020 AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting ECCV 2020 Dynamic and Static Context-aware LSTM for Multi-agent Motion Prediction ECCV 2020 PolarMask: Single Shot Instance Segmentation With Polar Representation CVPR 2020 Exemplar Normalization for Learning Deep Representation CVPR 2020 Online Knowledge Distillation via Collaborative Learning CVPR 2020 Learning Depth-Guided Convolutions for Monocular 3D Object Detection CVPR 2020 Towards Photo-Realistic Virtual Try-On by Adaptively Generating-Preserving Image Content CVPR 2020 MaskGAN: Towards Diverse and Interactive Facial Image Manipulation CVPR 2020 Learning a Reinforced Agent for Flexible Exposure Bracketing Selection CVPR 2020 Every Frame Counts: Joint Learning of Video Segmentation and Optical Flow AAAI 2020 Channel Equilibrium Networks for Learning Deep Representation ICML 2020 Differentiable Dynamic Normalization for Learning Deep Representation ICML 2019 DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images CVPR 2019 SSN: Learning Sparse Switchable Normalization via SparsestMax CVPR 2019 Switchable Whitening for Deep Representation Learning ICCV 2019 Deep Self-Learning From Noisy Labels ICCV 2019 Vision-Infused Deep Audio Inpainting ICCV 2019 Differentiable Learning-to-Normalize via Switchable Normalization ICLR 2019 Talking Face Generation by Adversarially Disentangled Audio-Visual Representation AAAI 2019 CamNet: Coarse-to-Fine Retrieval for Camera Re-Localization ICCV 2019 Differentiable Learning-to-Group Channels via Groupable Convolutional Neural Networks ICCV 2019 Fashion Retrieval via Graph Reasoning Networks on a Similarity Pyramid ICCV 2019 Towards Understanding Regularization in Batch Normalization ICLR 2019 Once a MAN: Towards Multi-Target Attack via Learning Multi-Target Adversarial Network Once ICCV 2019 Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net ECCV 2018 Adaboost with Auto-Evaluation for Conversational Models IJCAI 2018 FaceID-GAN: Learning a Symmetry Three-Player GAN for Identity-Preserving Face Synthesis CVPR 2018 Kalman Normalization: Normalizing Internal Representations Across Network Layers NIPS 2018 Not All Pixels Are Equal: Difficulty-Aware Semantic Segmentation via Deep Layer Cascade CVPR 2017 Learning Object Interactions and Descriptions for Semantic Image Segmentation CVPR 2017 EigenNet: Towards Fast and Structural Learning of Deep Neural Networks IJCAI 2017 Learning Deep Architectures via Generalized Whitened Neural Networks ICML 2017 Deep Dual Learning for Semantic Image Segmentation ICCV 2017 Browsing Regularities in Hedonic Content Systems IJCAI 2016 WIDER FACE: A Face Detection Benchmark CVPR 2016 DeepFashion: Powering Robust Clothes Recognition and Retrieval With Rich Annotations CVPR 2016 Deep Learning Face Attributes in the Wild ICCV 2015 DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection CVPR 2015 Pedestrian Detection Aided by Deep Learning Semantic Tasks CVPR 2015 Matrix Factorization with Scale-Invariant Parameters IJCAI 2015 Supervised Representation Learning: Transfer Learning with Deep Autoencoders IJCAI 2015 Semantic Image Segmentation via Deep Parsing Network ICCV 2015 Deep Learning Strong Parts for Pedestrian Detection ICCV 2015 Learning Social Relation Traits From Face Images ICCV 2015 From Facial Parts Responses to Face Detection: A Deep Learning Approach ICCV 2015 A Large-Scale Car Dataset for Fine-Grained Categorization and Verification CVPR 2015 Switchable Deep Network for Pedestrian Detection CVPR 2014 Multi-View Perceptron: a Deep Model for Learning Face Identity and View Representations NIPS 2014 Clothing Co-Parsing by Joint Image Segmentation and Labeling CVPR 2014 Deep Learning Identity-Preserving Face Space ICCV 2013 Pedestrian Parsing via Deep Decompositional Network ICCV 2013 A Deep Sum-Product Architecture for Robust Facial Attributes Analysis ICCV 2013 Concept Learning for Cross-Domain Text Classification: A General Probabilistic Framework IJCAI 2013