Ruimao Zhang

45 papers · 2016–2025 · 10 conferences · across top CS/AI conferences

Achievements

+12 more ↓

🌈 Renaissance Researcher (8) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (10) 🏃 Academic Marathon (9) 🗺️ Taxonomy Completionist (70)

🗺️ Taxonomy Completionist (70) 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🏆 Grand Slam 🤝 Dynamic Duo (13) 🧬 Topic Evolution 🔬 Deep Specialist (10) 👑 Triple Crown 🔥 Unstoppable (7) ⚡ Prolific Year (11) 💎 Century Club (45) 🗃️ Keyword Collector (168)

Conferences

CVPR (14) ICCV (7) ECCV (5) NIPS (5) ICLR (4) AAAI (2) CORL (2) ICML (2) IJCAI (2) MIDL (2)

Top co-authors

Zhen Li (13) Ping Luo (10) jie Yang (10) Ailing Zeng (7) Shuguang Cui (7) Xu Yan (7) Yiran Qin (6) Lei Zhang (6) zhanglin peng (4) Jiantao Gao (4)

Keywords

semantic segmentation (8) convolutional neural network (4) multimodal large language model (4) knowledge distillation (3) pose estimation (3) image generation (3) multi-modal learning (3) point cloud (3) virtual try-on (2) image segmentation (2) visual prompt (2) diffusion model (2) neural network optimization (2) motion generation (2) image retrieval (2) 3d vision (2) batch normalization (2) cross-modal learning (2) 3d object detection (2) human parsing (2)

Papers

ScaMo: Exploring the Scaling Law in Autoregressive Motion Generation Model CVPR 2025 High-Dynamic Radar Sequence Prediction for Weather Nowcasting Using Spatiotemporal Coherent Gaussian Representation ICLR 2025 DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable Text-to-Image Diffusion Generation CVPR 2025 RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints ICCV 2025 WorldSimBench: Towards Video Generation Models as World Simulators ICML 2025 Ensuring Force Safety in Vision-Guided Robotic Manipulation via Implicit Tactile Calibration CORL 2025 CDP: Towards Robust Autoregressive Visuomotor Policy Learning via Causal Diffusion CORL 2025 HumanTOMATO: Text-aligned Whole-body Motion Generation ICML 2024 KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension NIPS 2024 X4D-SceneFormer: Enhanced Scene Understanding on 4D Point Cloud Videos through Cross-Modal Knowledge Transfer AAAI 2024 Open-World Human-Object Interaction Detection via Multi-modal Prompts CVPR 2024 SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models CVPR 2024 FreeMan: Towards Benchmarking 3D Human Pose Estimation under Real-World Conditions CVPR 2024 MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception CVPR 2024 SEED-Bench: Benchmarking Multimodal Large Language Models CVPR 2024 F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions ECCV 2024 X-Pose: Detecting Any Keypoints ECCV 2024 Enhancing Human-AI Collaboration Through Logic-Guided Reasoning ICLR 2024 Discovering Intrinsic Spatial-Temporal Logic Rules to Explain Human Actions NIPS 2023 Inherent Consistent Learning for Accurate Semi-supervised Medical Image Segmentation MIDL 2023 Neural Interactive Keypoint Detection ICCV 2023 SupFusion: Supervised LiDAR-Camera Fusion for 3D Object Detection ICCV 2023 Semantic Human Parsing via Scalable Semantic Transfer Over Multiple Label Domains CVPR 2023 Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation ICLR 2023 Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset NIPS 2023 Toward Unpaired Multi-modal Medical Image Segmentation via Learning Structured Semantic Consistency MIDL 2023 Let Images Give You More: Point Cloud Cross-Modal Training for Shape Analysis NIPS 2022 AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation NIPS 2022 Weakly Supervised Object Localization via Transformer with Implicit Spatial Calibration ECCV 2022 2DPASS: 2D Priors Assisted Semantic Segmentation on LiDAR Point Clouds ECCV 2022 End-to-End Dense Video Captioning With Parallel Decoding ICCV 2021 Sparse Single Sweep LiDAR Point Cloud Segmentation via Learning Contextual Shape Priors from Scene Completion AAAI 2021 InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds Through Instance Multi-Level Contextual Referring ICCV 2021 PointLIE: Locally Invertible Embedding for Point Cloud Sampling and Recovery IJCAI 2021 Parser-Free Virtual Try-On via Distilling Appearance Flows CVPR 2021 Towards Photo-Realistic Virtual Try-On by Adaptively Generating-Preserving Image Content CVPR 2020 Exemplar Normalization for Learning Deep Representation CVPR 2020 Towards Content-Independent Multi-Reference Super-Resolution: Adaptive Pattern Matching and Feature Aggregation ECCV 2020 SSN: Learning Sparse Switchable Normalization via SparsestMax CVPR 2019 Differentiable Learning-to-Group Channels via Groupable Convolutional Neural Networks ICCV 2019 Once a MAN: Towards Multi-Target Attack via Learning Multi-Target Adversarial Network Once ICCV 2019 Differentiable Learning-to-Normalize via Switchable Normalization ICLR 2019 DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images CVPR 2019 Deep Structured Scene Parsing by Learning With Image Descriptions CVPR 2016 Geometric Scene Parsing with Hierarchical LSTM IJCAI 2016