Junsong Yuan

99 papers · 2012–2026 · 9 conferences · across top CS/AI conferences

Achievements

+15 more ↓

🧭 Keyword Pioneer 🌍 Conference Polyglot (9) 🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (11) 🏃 Academic Marathon (14)

🌍 Conference Polyglot (9) 🏃 Academic Marathon (14) 🗺️ Taxonomy Completionist (11) 🏠 Conference Loyalist (33) 🔬 Deep Specialist (18) 🧬 Topic Evolution 🏆 Keyword Champion 👥 Mega-Team (35) 🗃️ Keyword Collector (369) ⚡ Prolific Year (11) 🚀 Conference Pioneer 📈 Trend Setter 💎 Century Club (97) 🔥 Unstoppable (15) ❓ The Questioner

Conferences

CVPR (33) ICCV (24) ECCV (21) AAAI (9) IJCAI (4) WACV (4) NIPS (2) ACML (1) ICLR (1)

Top co-authors

Liuhao Ge (9) DAVID DOERMANN (9) Tianyu Luan (8) Yuanhao Zhai (8) Yi Xu (7) Liangchen Song (7) Jingjing Meng (7) Ming Yang (6) Jialian Wu (6) Gang Hua (6)

Keywords

hand pose estimation (10) action recognition (9) video understanding (8) point cloud (6) domain adaptation (6) object detection (5) semantic segmentation (5) weakly supervised learning (5) 3d hand pose estimation (5) convolutional neural network (5) 3d reconstruction (5) 3d vision (4) depth image (4) 3d pose estimation (4) data augmentation (3) zero-shot learning (3) pedestrian detection (3) semi-supervised learning (3) human pose estimation (3) synthetic datum (3)

Papers

Textured Geometry Evaluation: Perceptual 3D Textured Shape Metric via 3D Latent-Geometry Network AAAI 2026 Chain-of-Look Spatial Reasoning for Dense Surgical Instrument Counting WACV 2026 SRAM: Shape-Realism Alignment Metric for No Reference 3D Shape Evaluation AAAI 2026 dFLMoE: Decentralized Federated Learning via Mixture of Experts for Medical Data Analysis CVPR 2025 PathDiff: Histopathology Image Synthesis with Unpaired Text and Mask Conditions ICCV 2025 Recognizing Actions from Robotic View for Natural Human-Robot Interaction ICCV 2025 Text2Outfit: Controllable Outfit Generation with Multimodal Language Models ICCV 2025 CompSlider: Compositional Slider for Disentangled Multiple-Attribute Image Generation ICCV 2025 UST-SSM: Unified Spatio-Temporal State Space Models for Point Cloud Video Modeling ICCV 2025 IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation ECCV 2024 Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation ECCV 2024 Spectrum AUC Difference (SAUCD): Human-aligned 3D Shape Evaluation CVPR 2024 GRiT: A Generative Region-to-text Transformer for Object Understanding ECCV 2024 Forecasting Future Videos from Novel Views via Disentangled 3D Scene Representation ECCV 2024 FSC: Few-point Shape Completion CVPR 2024 Show Your Face: Restoring Complete Facial Images From Partial Observations for VR Meeting WACV 2024 Interaction-centric Spatio-Temporal Context Reasoning for Multi-Person Video HOI Recognition ECCV 2024 Divide and Fuse: Body Part Mesh Recovery from Partially Visible Human Images ECCV 2024 Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation NIPS 2024 High Fidelity 3D Hand Shape Reconstruction via Scalable Graph Frequency Decomposition CVPR 2023 Progressive Multi-View Human Mesh Recovery with Self-Supervision AAAI 2023 Neural Voting Field for Camera-Space 3D Hand Pose Estimation CVPR 2023 3D-Aware Facial Landmark Detection via Multi-View Consistent Training on Synthetic Data CVPR 2023 Towards Generic Image Manipulation Detection with Weakly-Supervised Self-Consistency Learning ICCV 2023 SOAR: Scene-debiasing Open-set Action Recognition ICCV 2023 Open Set Video HOI detection from Action-Centric Chain-of-Look Prompting ICCV 2023 NeuRBF: A Neural Fields Representation with Adaptive Radial Basis Functions ICCV 2023 Uncertainty-aware State Space Transformer for Egocentric 3D Hand Trajectory Forecasting ICCV 2023 Self-Supervised Distilled Learning for Multi-Modal Misinformation Identification WACV 2023 Semantics-Depth-Symbiosis: Deeply Coupled Semi-Supervised Learning of Semantics and Depth WACV 2023 AiATrack: Attention in Attention for Transformer Visual Tracking ECCV 2022 PREF: Predictability Regularized Neural Motion Fields ECCV 2022 Neural Correspondence Field for Object Pose Estimation ECCV 2022 Efficient Video Instance Segmentation via Tracklet Query and Proposal CVPR 2022 OVIS: Open-Vocabulary Visual Instance Search via Visual-Semantic Aligned Representation Learning AAAI 2022 Learning Transferable Human-Object Interaction Detector With Natural Language Supervision CVPR 2022 MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video CVPR 2022 Stacked Homography Transformations for Multi-View Pedestrian Detection ICCV 2021 Model-Based 3D Hand Reconstruction via Self-Supervised Learning CVPR 2021 Track To Detect and Segment: An Online Multi-Object Tracker CVPR 2021 Rethinking Soft Labels for Knowledge Distillation: A Bias–Variance Tradeoff Perspective ICLR 2021 Robust Knowledge Transfer via Hybrid Forward on the Teacher-Student Model AAAI 2021 ACSNet: Action-Context Separation Network for Weakly Supervised Temporal Action Localization AAAI 2021 Weakly Supervised Temporal Action Localization Through Learning Explicit Subspaces for Action and Context AAAI 2021 A Unified 3D Human Motion Synthesis Model via Conditional Variational Auto-Encoder ICCV 2021 High Quality Disparity Remapping With Two-Stage Warping ICCV 2021 Discovering Human Interactions With Large-Vocabulary Objects via Query and Multi-Scale Detection ICCV 2021 Discovering Human Interactions With Novel Objects via Zero-Shot Learning CVPR 2020 Two-Stream Consensus Network for Weakly-Supervised Temporal Action Localization ECCV 2020 Learning Progressive Joint Propagation for Human Motion Prediction ECCV 2020 Temporal Distinct Representation Learning for Action Recognition ECCV 2020 Clustering Driven Deep Autoencoder for Video Anomaly Detection ECCV 2020 Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and Objects for 3D Hand Pose Estimation under Hand-Object Interaction ECCV 2020 Hand-Transformer: Non-Autoregressive Structured Modeling for 3D Hand Pose Estimation ECCV 2020 Structure-Aware Human-Action Generation ECCV 2020 Learning Diverse Stochastic Human-Action Generators by Learning Smooth Latent Transitions AAAI 2020 Temporal-Context Enhanced Detection of Heavily Occluded Pedestrians CVPR 2020 3DV: 3D Dynamic Voxel for Action Recognition in Depth Video CVPR 2020 SO-HandNet: Self-Organizing Network for 3D Hand Pose Estimation With Semi-Supervised Learning ICCV 2019 Bayesian Uncertainty Matching for Unsupervised Domain Adaptation IJCAI 2019 A2J: Anchor-to-Joint Regression Network for 3D Articulated Pose Estimation From a Single Depth Image ICCV 2019 PointCloud Saliency Maps ICCV 2019 Exploiting Spatial-Temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks ICCV 2019 Temporal Structure Mining for Weakly Supervised Action Detection ICCV 2019 Discriminative Feature Transformation for Occluded Pedestrian Detection ICCV 2019 SPAGAN: Shortest Path Graph Attention Network IJCAI 2019 Exploiting Local Feature Patterns for Unsupervised Domain Adaptation AAAI 2019 Kervolutional Neural Networks CVPR 2019 Joint Representative Selection and Feature Learning: A Semi-Supervised Approach CVPR 2019 3D Hand Shape and Pose Estimation From a Single RGB Image CVPR 2019 Conditional Generative Adversarial Network for Structured Domain Adaptation CVPR 2018 Salience Guided Depth Calibration for Perceptually Optimized Compressive Light Field 3D Display CVPR 2018 Depth-Based 3D Hand Pose Estimation: From Current Achievements to Future Goals CVPR 2018 Bi-box Regression for Pedestrian Detection and Occlusion Estimation ECCV 2018 Point-to-Point Regression PointNet for 3D Hand Pose Estimation ECCV 2018 Weakly-supervised 3D Hand Pose Estimation from Monocular RGB Images ECCV 2018 Hand PointNet: 3D Hand Pose Estimation Using Point Sets CVPR 2018 Deformable Pose Traversal Convolution for 3D Action and Gesture Recognition ECCV 2018 Product Quantization Network for Fast Image Retrieval ECCV 2018 Multi-View Harmonized Bilinear Network for 3D Object Recognition CVPR 2018 Recognizing Human Actions as the Evolution of Pose Estimation Maps CVPR 2018 Multi-Label Learning of Part Detectors for Heavily Occluded Pedestrian Detection ICCV 2017 Spatio-Temporal Naive-Bayes Nearest-Neighbor (ST-NBNN) for Skeleton-Based Action Recognition CVPR 2017 Object Co-Skeletonization With Co-Segmentation CVPR 2017 Is My Object in This Video? Reconstruction-based Object Search in Videos IJCAI 2017 3D Convolutional Neural Networks for Efficient and Robust Hand Pose Estimation From Single Depth Images CVPR 2017 HOPE: Hierarchical Object Prototype Encoding for Efficient Object Instance Search in Videos CVPR 2017 Fried Binary Embedding for High-Dimensional Visual Features CVPR 2017 Compressive Quantization for Fast Object Instance Search in Videos ICCV 2017 Common Action Discovery and Localization in Unconstrained Videos ICCV 2017 To Project More or to Quantize More: Minimize Reconstruction Bias for Learning Compact Binary Codes IJCAI 2016 Robust 3D Hand Pose Estimation in Single Depth Images: From Single-View CNN to Multi-View CNNs CVPR 2016 From Keyframes to Key Objects: Video Summarization by Representative Object Proposal Selection CVPR 2016 Adaptive Exponential Smoothing for Online Filtering of Pixel Prediction Maps ICCV 2015 Fast Action Proposals for Human Action Detection and Search CVPR 2015 Multi-feature Spectral Clustering with Minimax Optimization CVPR 2014 Topical Video Object Discovery from Key Frames by Modeling Word Co-occurrence Prior CVPR 2013 Max-Margin Structured Output Regression for Spatio-Temporal Action Localization NIPS 2012 Spatial Locality-Aware Sparse Coding and Dictionary Learning ACML 2012