Yongming Rao
49 papers · 2017–2026 · 9 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+14 more ↓ Show less ↑
π Interdisciplinary Bridge π Conference Polyglot (9) π Academic Marathon (9) π Renaissance Researcher (7) πΊοΈ Taxonomy Completionist (87)
πΊοΈ
Taxonomy Completionist
(87)
π§
Keyword Pioneer
π£
Hot Topic Early Bird
π
Conference Loyalist
(20)
π€
Dynamic Duo
(42)
π
Grand Slam
π§¬
Topic Evolution
π
Keyword Champion
π¬
Deep Specialist
(12)
ποΈ
Keyword Collector
(219)
β
The Questioner
β‘
Prolific Year
(7)
π
Century Club
(49)
π₯
Unstoppable
(10)
Conferences
CVPR (20)
ICCV (12)
NIPS (6)
ECCV (5)
ICLR (2)
AAAI (1)
CORL (1)
ICML (1)
WACV (1)
Top co-authors
Keywords
point cloud
(8)
semantic segmentation
(5)
diffusion model
(5)
video understanding
(4)
contrastive learning
(4)
vision transformer
(3)
depth estimation
(3)
multimodal learning
(3)
action recognition
(3)
object detection
(3)
3d vision
(2)
representation learning
(2)
3d object detection
(2)
reinforcement learning
(2)
domain adaptation
(2)
3d reconstruction
(2)
image generation
(2)
attention mechanism
(2)
image restoration
(2)
knowledge distillation
(2)
Papers
BREEN: Bridge Data-Efficient Encoder-Free Multimodal Learning with Learnable Queries
WACV 2026
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
ICLR 2025
RBench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation
ICML 2025
SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs
ICCV 2025
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
CVPR 2025
Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model
CVPR 2025
X-3D: Explicit 3D Structure Modeling for Point Cloud Recognition
CVPR 2024
Generative Multimodal Models are In-Context Learners
CVPR 2024
Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior
CVPR 2024
Efficient Inference of Vision Instruction-Following Models with Elastic Cache
ECCV 2024
Unleashing Text-to-Image Diffusion Models for Visual Perception
ICCV 2023
TCOVIS: Temporally Consistent Online Video Instance Segmentation
ICCV 2023
Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models
ICCV 2023
UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of Diffusion Models
NIPS 2023
FLAG3D: A 3D Fitness Activity Dataset With Language Instruction
CVPR 2023
PLOT: Prompt Learning with Optimal Transport for Vision-Language Models
ICLR 2023
DiffSwap: High-Fidelity and Controllable Face Swapping via 3D-Aware Masked Diffusion
CVPR 2023
AMixer: Adaptive Weight Mixing for Self-Attention Free Vision Transformers
ECCV 2022
HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions
NIPS 2022
P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with Point-to-Pixel Prompting
NIPS 2022
SurroundDepth: Entangling Surrounding Views for Self-Supervised Multi-Camera Depth Estimation
CORL 2022
Stochastic Trajectory Prediction via Motion Indeterminacy Diffusion
CVPR 2022
FineDiving: A Fine-Grained Dataset for Procedure-Aware Action Quality Assessment
CVPR 2022
Back to Reality: Weakly-Supervised 3D Object Detection With Shape-Guided Label Enhancement
CVPR 2022
Point-BERT: Pre-Training 3D Point Cloud Transformers With Masked Point Modeling
CVPR 2022
DenseCLIP: Language-Guided Dense Prediction With Context-Aware Prompting
CVPR 2022
SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation
CVPR 2022
LiDAR Distillation: Bridging the Beam-Induced Domain Gap for 3D Object Detection
ECCV 2022
PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clouds
CVPR 2021
Global Filter Networks for Image Classification
NIPS 2021
Group-Aware Contrastive Regression for Action Quality Assessment
ICCV 2021
PoinTr: Diverse Point Cloud Completion With Geometry-Aware Transformers
ICCV 2021
RandomRooms: Unsupervised Pre-Training From Synthetic Shapes and Randomized Layouts for 3D Object Detection
ICCV 2021
NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-View Stereo
ICCV 2021
Towards Interpretable Deep Metric Learning With Structural Matching
ICCV 2021
Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-Identification
ICCV 2021
Multi-Proxy Wasserstein Classifier for Image Classification
AAAI 2021
DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification
NIPS 2021
Structure-Preserving Super Resolution With Gradient Guidance
CVPR 2020
Global-Local Bidirectional Reasoning for Unsupervised Representation Learning of 3D Point Clouds
CVPR 2020
MetaDistiller: Network Self-Boosting via Meta-Learned Top-Down Distillation
ECCV 2020
Deep Face Super-Resolution With Iterative Collaboration Between Attentive Recovery and Landmark Estimation
CVPR 2020
Temporal Coherence or Temporal Motion: Which is More Critical for Video-based Person Re-identification?
ECCV 2020
COIN: A Large-Scale Dataset for Comprehensive Instructional Video Analysis
CVPR 2019
Spherical Fractal Convolutional Neural Networks for Point Cloud Recognition
CVPR 2019
Learning Globally Optimized Object Detector via Policy Gradient
CVPR 2018
Attention-Aware Deep Reinforcement Learning for Video Face Recognition
ICCV 2017
Runtime Neural Pruning
NIPS 2017
Learning Discriminative Aggregation Network for Video-Based Face Recognition
ICCV 2017