Yongming Rao

49 papers · 2017–2026 · 9 conferences · across top CS/AI conferences

Achievements

+14 more ↓

🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (9) 🏃 Academic Marathon (9) 🌈 Renaissance Researcher (7) 🗺️ Taxonomy Completionist (87)

🗺️ Taxonomy Completionist (87) 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🏠 Conference Loyalist (20) 🤝 Dynamic Duo (42) 🏆 Grand Slam 🧬 Topic Evolution 🏆 Keyword Champion 🔬 Deep Specialist (12) 🗃️ Keyword Collector (219) ❓ The Questioner ⚡ Prolific Year (7) 💎 Century Club (49) 🔥 Unstoppable (10)

Conferences

CVPR (20) ICCV (12) NIPS (6) ECCV (5) ICLR (2) AAAI (1) CORL (1) ICML (1) WACV (1)

Top co-authors

Jiwen Lu (42) Jie Zhou (38) Wenliang Zhao (10) Guangyi Chen (7) Xumin Yu (7) Benlin Liu (7) Zuyan Liu (7) Ziyi Wang (6) Yi Wei (6) Yansong Tang (5)

Keywords

point cloud (8) semantic segmentation (5) diffusion model (5) video understanding (4) contrastive learning (4) vision transformer (3) depth estimation (3) multimodal learning (3) action recognition (3) object detection (3) 3d vision (2) representation learning (2) 3d object detection (2) reinforcement learning (2) domain adaptation (2) 3d reconstruction (2) image generation (2) attention mechanism (2) image restoration (2) knowledge distillation (2)

Papers

BREEN: Bridge Data-Efficient Encoder-Free Multimodal Learning with Learnable Queries WACV 2026 Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution ICLR 2025 RBench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation ICML 2025 SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs ICCV 2025 Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models CVPR 2025 Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model CVPR 2025 X-3D: Explicit 3D Structure Modeling for Point Cloud Recognition CVPR 2024 Generative Multimodal Models are In-Context Learners CVPR 2024 Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior CVPR 2024 Efficient Inference of Vision Instruction-Following Models with Elastic Cache ECCV 2024 Unleashing Text-to-Image Diffusion Models for Visual Perception ICCV 2023 TCOVIS: Temporally Consistent Online Video Instance Segmentation ICCV 2023 Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models ICCV 2023 UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of Diffusion Models NIPS 2023 FLAG3D: A 3D Fitness Activity Dataset With Language Instruction CVPR 2023 PLOT: Prompt Learning with Optimal Transport for Vision-Language Models ICLR 2023 DiffSwap: High-Fidelity and Controllable Face Swapping via 3D-Aware Masked Diffusion CVPR 2023 AMixer: Adaptive Weight Mixing for Self-Attention Free Vision Transformers ECCV 2022 HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions NIPS 2022 P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with Point-to-Pixel Prompting NIPS 2022 SurroundDepth: Entangling Surrounding Views for Self-Supervised Multi-Camera Depth Estimation CORL 2022 Stochastic Trajectory Prediction via Motion Indeterminacy Diffusion CVPR 2022 FineDiving: A Fine-Grained Dataset for Procedure-Aware Action Quality Assessment CVPR 2022 Back to Reality: Weakly-Supervised 3D Object Detection With Shape-Guided Label Enhancement CVPR 2022 Point-BERT: Pre-Training 3D Point Cloud Transformers With Masked Point Modeling CVPR 2022 DenseCLIP: Language-Guided Dense Prediction With Context-Aware Prompting CVPR 2022 SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation CVPR 2022 LiDAR Distillation: Bridging the Beam-Induced Domain Gap for 3D Object Detection ECCV 2022 PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clouds CVPR 2021 Global Filter Networks for Image Classification NIPS 2021 Group-Aware Contrastive Regression for Action Quality Assessment ICCV 2021 PoinTr: Diverse Point Cloud Completion With Geometry-Aware Transformers ICCV 2021 RandomRooms: Unsupervised Pre-Training From Synthetic Shapes and Randomized Layouts for 3D Object Detection ICCV 2021 NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-View Stereo ICCV 2021 Towards Interpretable Deep Metric Learning With Structural Matching ICCV 2021 Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-Identification ICCV 2021 Multi-Proxy Wasserstein Classifier for Image Classification AAAI 2021 DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification NIPS 2021 Structure-Preserving Super Resolution With Gradient Guidance CVPR 2020 Global-Local Bidirectional Reasoning for Unsupervised Representation Learning of 3D Point Clouds CVPR 2020 MetaDistiller: Network Self-Boosting via Meta-Learned Top-Down Distillation ECCV 2020 Deep Face Super-Resolution With Iterative Collaboration Between Attentive Recovery and Landmark Estimation CVPR 2020 Temporal Coherence or Temporal Motion: Which is More Critical for Video-based Person Re-identification? ECCV 2020 COIN: A Large-Scale Dataset for Comprehensive Instructional Video Analysis CVPR 2019 Spherical Fractal Convolutional Neural Networks for Point Cloud Recognition CVPR 2019 Learning Globally Optimized Object Detector via Policy Gradient CVPR 2018 Attention-Aware Deep Reinforcement Learning for Video Face Recognition ICCV 2017 Runtime Neural Pruning NIPS 2017 Learning Discriminative Aggregation Network for Video-Based Face Recognition ICCV 2017