Jingdong Wang
153 papers · 2013–2026 · 8 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+18 more ↓ Show less ↑
π§ Keyword Pioneer πΊοΈ Taxonomy Completionist (17) π Interdisciplinary Bridge π Renaissance Researcher (6) π£ Hot Topic Early Bird
π
Interdisciplinary Bridge
π£
Hot Topic Early Bird
πΊοΈ
Taxonomy Completionist
(17)
π
Keyword Trendsetter Combo
(5)
π
Conference Loyalist
(20)
π±
Topic Pioneer
π¬
Deep Specialist
(21)
π§¬
Topic Evolution
π
Keyword Champion
(3)
π€
Dynamic Duo
(51)
π
Grand Slam
π
Century Club
(152)
π
Trend Setter
π
Conference Pioneer
β‘
Prolific Year
(30)
π₯
Unstoppable
(13)
β
The Questioner
(3)
ποΈ
Keyword Collector
(526)
Conferences
CVPR (56)
ICCV (27)
ECCV (22)
NIPS (20)
AAAI (9)
ICLR (7)
ICML (6)
IJCAI (6)
Top co-authors
Keywords
semantic segmentation
(13)
object detection
(12)
convolutional neural network
(11)
diffusion model
(10)
human pose estimation
(9)
semi-supervised learning
(7)
vision-language model
(7)
person re-identification
(7)
image classification
(6)
representation learning
(6)
attention mechanism
(6)
contrastive learning
(6)
vision transformer
(6)
video generation
(6)
video understanding
(5)
image retrieval
(5)
knowledge distillation
(5)
temporal modeling
(5)
image generation
(5)
few-shot learning
(5)
Papers
EM-KD: Distilling Efficient Multimodal Large Language Model with Unbalanced Vision Tokens
AAAI 2026
SpotActor: Training-Free Layout-Controlled Consistent Image Generation
AAAI 2025
Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models
AAAI 2025
DynaMind: Reasoning over Abstract Video Dynamics for Embodied Decision-Making
ICML 2025
OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation
CVPR 2025
Are Images Indistinguishable to Humans Also Indistinguishable to Classifiers?
CVPR 2025
Manifold Constraint Reduces Exposure Bias in Accelerated Diffusion Sampling
ICLR 2025
MGMapNet: Multi-Granularity Representation Learning for End-to-End Vectorized HD Map Construction
ICLR 2025
Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation
ICLR 2025
Low-Biased General Annotated Dataset Generation
CVPR 2025
Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model
CVPR 2025
Action Detail Matters: Refining Video Recognition with Local Action Queries
CVPR 2025
AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers
CVPR 2025
VoxelSplat: Dynamic Gaussian Splatting as an Effective Loss for Occupancy and Flow Prediction
CVPR 2025
Continual SFT Matches Multimodal RLHF with Negative Supervision
CVPR 2025
Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer
CVPR 2025
TexGarment: Consistent Garment UV Texture Generation via Efficient 3D Structure-Guided Diffusion Transformer
CVPR 2025
VidEvo: Evolving Video Editing through Exhaustive Temporal Modeling
IJCAI 2025
Schedule Your Edit: A Simple yet Effective Diffusion Noise Schedule for Image Editing
NIPS 2024
Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation
AAAI 2024
SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-Form Layout-to-Image Generation
AAAI 2024
Multi-Domain Incremental Learning for Face Presentation Attack Detection
AAAI 2024
A Multimodal, Multi-Task Adapting Framework for Video Action Recognition
AAAI 2024
Mobile Attention: Mobile-Friendly Linear-Attention for Vision Transformers
ICML 2024
Towards Unified Multi-granularity Text Detection with Interactive Attention
ICML 2024
BEVSpread: Spread Voxel Pooling for Bird's-Eye-View Representation in Vision-based Roadside 3D Object Detection
CVPR 2024
GGRt: Towards Generalizable 3D Gaussians without Pose Priors in Real-Time
ECCV 2024
Timestep-Aware Correction for Quantized Diffusion Models
ECCV 2024
Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression
ECCV 2024
ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer
ECCV 2024
OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection
ECCV 2024
Automated Multi-level Preference for MLLMs
NIPS 2024
MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts
NIPS 2024
Dense Connector for MLLMs
NIPS 2024
PLIP: Language-Image Pre-training for Person Representation Learning
NIPS 2024
ShowMaker: Creating High-Fidelity 2D Human Video via Fine-Grained Diffusion Modeling
NIPS 2024
Flipped Classroom: Aligning Teacher Attention with Student in Generalized Category Discovery
NIPS 2024
Octopus: A Multi-modal LLM with Parallel Recognition and Sequential Understanding
NIPS 2024
SEED: A Simple and Effective 3D DETR in Point Clouds
ECCV 2024
IRGen: Generative Modeling for Image Retrieval
ECCV 2024
Interactive 3D Object Detection with Prompts
ECCV 2024
LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction
ECCV 2024
Evaluation of Text-to-Video Generation Models: A Dynamics Perspective
NIPS 2024
Let the Avatar Talk using Texts without Paired Training Data
ECCV 2024
LION: Linear Group RNN for 3D Object Detection in Point Clouds
NIPS 2024
OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding
NIPS 2024
Forgery-aware Adaptive Transformer for Generalizable Synthetic Image Detection
CVPR 2024
Decoupled Pseudo-labeling for Semi-Supervised Monocular 3D Object Detection
CVPR 2024
VRP-SAM: SAM with Visual Reference Prompt
CVPR 2024
MS-DETR: Efficient DETR Training with Mixed Supervision
CVPR 2024
GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding
CVPR 2024
Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval
CVPR 2024
Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection
NIPS 2023
HAP: Structure-Aware Masked Image Modeling for Human-Centric Perception
NIPS 2023
DAC-DETR: Divide the Attention Layers and Conquer
NIPS 2023
Boosting Few-shot Action Recognition with Graph-guided Hybrid Matching
ICCV 2023
CPCM: Contextual Point Cloud Modeling for Weakly-supervised Point Cloud Semantic Segmentation
ICCV 2023
Augmentation Matters: A Simple-Yet-Effective Approach to Semi-Supervised Semantic Segmentation
CVPR 2023
Instance-Specific and Model-Adaptive Supervision for Semi-Supervised Semantic Segmentation
CVPR 2023
Ambiguity-Resistant Semi-Supervised Learning for Dense Object Detection
CVPR 2023
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition With Pre-Trained Vision-Language Models
CVPR 2023
StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-Based Generator
CVPR 2023
CAPE: Camera View Position Embedding for Multi-View 3D Object Detection
CVPR 2023
PSVT: End-to-End Multi-Person 3D Pose and Shape Estimation With Progressive Video Transformers
CVPR 2023
Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers
CVPR 2023
Semi-DETR: Semi-Supervised Object Detection With Detection Transformers
CVPR 2023
Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
CVPR 2023
Forward Flow for Novel View Synthesis of Dynamic Scenes
ICCV 2023
Delicate Textured Mesh Recovery from NeRF via Adaptive Surface Refinement
ICCV 2023
StrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-training
ICLR 2023
Graph Contrastive Learning for Skeleton-based Action Recognition
ICLR 2023
What Can Simple Arithmetic Operations Do for Temporal Modeling?
ICCV 2023
Cyclically Disentangled Feature Translation for Face Anti-spoofing
AAAI 2023
CFCG: Semi-Supervised Semantic Segmentation via Cross-Fusion and Contour Guidance Supervision
ICCV 2023
Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models
ICCV 2023
UATVR: Uncertainty-Adaptive Text-Video Retrieval
ICCV 2023
Gradient-based Sampling for Class Imbalanced Semi-supervised Object Detection
ICCV 2023
Group Pose: A Simple Baseline for End-to-End Multi-Person Pose Estimation
ICCV 2023
Group DETR: Fast DETR Training with Group-Wise One-to-Many Assignment
ICCV 2023
Robust Video Portrait Reenactment via Personalized Representation Quantization
AAAI 2023
s-Adaptive Decoupled Prototype for Few-Shot Object Detection
ICCV 2023
Unified Pre-Training with Pseudo Texts for Text-To-Image Person Re-Identification
ICCV 2023
Learning Versatile Neural Architectures by Propagating Network Codes
ICLR 2022
Delving into Sequential Patches for Deepfake Detection
NIPS 2022
RTFormer: Efficient Design for Real-Time Semantic Segmentation with Transformer
NIPS 2022
Singular Value Fine-tuning: Few-shot Segmentation requires Few-parameters Fine-tuning
NIPS 2022
Human-Object Interaction Detection via Disentangled Transformer
CVPR 2022
Few-Shot Head Swapping in the Wild
CVPR 2022
Few-Shot Font Generation by Learning Fine-Grained Local Styles
CVPR 2022
MixFormer: Mixing Features Across Windows and Dimensions
CVPR 2022
Expressive Talking Head Generation With Granular Audio-Visual Control
CVPR 2022
Implicit Sample Extension for Unsupervised Person Re-Identification
CVPR 2022
ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval
CVPR 2022
GitNet: Geometric Prior-Based Transformation for Birds-Eye-View Segmentation
ECCV 2022
Action Quality Assessment with Temporal Parsing Transformer
ECCV 2022
StyleSwap: Style-Based Generator Empowers Robust Face Swapping
ECCV 2022
DaViT: Dual Attention Vision Transformers
ECCV 2022
UFO: Unified Feature Optimization
ECCV 2022
Diverse Learner: Exploring Diverse Supervision for Semi-Supervised Object Detection
ECCV 2022
CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for Image-Text Retrieval
ECCV 2022
On the Connection between Local Attention and Dynamic Depth-wise Convolution
ICLR 2022
Self-Guided Hard Negative Generation for Unsupervised Person Re-Identification
IJCAI 2022
Conditional DETR for Fast Training Convergence
ICCV 2021
HRFormer: High-Resolution Vision Transformer for Dense Predict
NIPS 2021
Semi-Supervised Semantic Segmentation With Cross Pseudo Supervision
CVPR 2021
Admix: Enhancing the Transferability of Adversarial Attacks
ICCV 2021
Bottom-Up Human Pose Estimation via Disentangled Keypoint Regression
CVPR 2021
SPANN: Highly-efficient Billion-scale Approximate Nearest Neighborhood Search
NIPS 2021
Lite-HRNet: A Lightweight High-Resolution Network
CVPR 2021
HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation
CVPR 2020
Weakly-Supervised Action Localization by Generative Attention Modeling
CVPR 2020
SegFix: Model-Agnostic Boundary Refinement for Segmentation
ECCV 2020
Informative Dropout for Robust Representation Learning: A Shape-bias Perspective
ICML 2020
Object-Contextual Representations for Semantic Segmentation
ECCV 2020
Point-Set Anchors for Object Detection, Instance Segmentation and Pose Estimation
ECCV 2020
Efficient Semantic Video Segmentation with Per-frame Inference
ECCV 2020
Closed-Loop Matters: Dual Regression Networks for Single Image Super-Resolution
CVPR 2020
Cross View Fusion for 3D Human Pose Estimation
ICCV 2019
Disparity-preserved Deep Cross-platform Association for Cross-platform Video Recommendation
IJCAI 2019
Structured Knowledge Distillation for Semantic Segmentation
CVPR 2019
Deep High-Resolution Representation Learning for Human Pose Estimation
CVPR 2019
S4Net: Single Stage Salient-Instance Segmentation
CVPR 2019
Global-Local Temporal Representations for Video Person Re-Identification
ICCV 2019
Interleaved Structured Sparse Convolutional Neural Networks
CVPR 2018
Global Versus Localized Generative Adversarial Nets
CVPR 2018
Weakly Supervised Dense Event Captioning in Videos
NIPS 2018
Deep Convolutional Neural Networks with Merge-and-Run Mappings
IJCAI 2018
Part-Aligned Bilinear Representations for Person Re-Identification
ECCV 2018
Weakly-Supervised Semantic Segmentation Network With Deep Seeded Region Growing
CVPR 2018
Interleaved Group Convolutions
ICCV 2017
Ensemble Diffusion for Retrieval
ICCV 2017
Deeply-Learned Part-Aligned Representations for Person Re-Identification
ICCV 2017
Human Pose Estimation Using Global and Local Normalization
ICCV 2017
Random Shifting for CNN: a Solution to Reduce Information Loss in Down-Sampling Layers
IJCAI 2017
DisturbLabel: Regularizing CNN on the Loss Layer
CVPR 2016
Supervised Quantization for Similarity Search
CVPR 2016
InterActive: Inter-Layer Activeness Propagation
CVPR 2016
Collaborative Quantization for Cross-Modal Similarity Search
CVPR 2016
Co-Saliency Detection via Looking Deep and Wide
CVPR 2015
Similarity Learning on an Explicit Polynomial Kernel Feature Map for Person Re-Identification
CVPR 2015
Person Re-Identification With Correspondence Structure Learning
ICCV 2015
Quantized Correlation Hashing for Fast Cross-Modal Search
IJCAI 2015
Scalable Person Re-Identification: A Benchmark
ICCV 2015
RIDE: Reversal Invariant Descriptor Enhancement
ICCV 2015
Sparse Composite Quantization
CVPR 2015
Orientational Pyramid Matching for Recognizing Indoor Scenes
CVPR 2014
Composite Quantization for Approximate Nearest Neighbor Search
ICML 2014
Online Robust Non-negative Dictionary Learning for Visual Tracking
ICCV 2013
Fixed-Point Model For Structured Labeling
ICML 2013
Fast Neighborhood Graph Search Using Cartesian Concatenation
ICCV 2013
Supervised Kernel Descriptors for Visual Recognition
CVPR 2013
Salient Object Detection: A Discriminative Regional Feature Integration Approach
CVPR 2013
Learning CRFs for Image Parsing with Adaptive Subgradient Descent
ICCV 2013