Mingyu Ding

51 papers · 2018–2026 · 11 conferences · across top CS/AI conferences

Achievements

+11 more ↓

🌍 Conference Polyglot (11) 🏃 Academic Marathon (7) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (14)

🐝 Cross-Pollinator (14) 🌈 Renaissance Researcher (7) 🗺️ Taxonomy Completionist (71) 🧬 Topic Evolution 👑 Triple Crown 🤝 Dynamic Duo (23) 🏆 Grand Slam 🗃️ Keyword Collector (186) 💎 Century Club (49) ⚡ Prolific Year (10) 🔥 Unstoppable (8)

Conferences

CVPR (11) NIPS (10) ICLR (9) AAAI (4) ECCV (4) ICML (4) CORL (3) ICCV (3) IJCAI (1) RSS (1) WACV (1)

Top co-authors

Ping Luo (23) Zhiwu Lu (15) Masayoshi Tomizuka (11) Wei Zhan (10) Yao Mu (9) Chuang Gan (9) Zhenfang Chen (8) Tao Xiang (7) Yuqi Huo (7) Chenfeng Xu (6)

Keywords

foundation model (4) self-supervised learning (3) transfer learning (3) multi-modal learning (3) contrastive learning (3) robotic manipulation (3) robot manipulation (2) video understanding (2) diffusion model (2) multimodal learning (2) adversarial learning (2) representation learning (2) autonomous driving (2) multi-task learning (2) object detection (2) pose estimation (2) unsupervised learning (2) image retrieval (2) domain adaptation (2) depth estimation (2)

Papers

Unlocking the Power of Large Multimodal Models for Robot Learning: Robustness, Generalization, and Opportunities AAAI 2026 ARCHE: A Novel Task to Evaluate LLMs on Latent Reasoning Chain Extraction AAAI 2026 DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation CVPR 2025 Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos ICCV 2025 WOMD-Reasoning: A Large-Scale Dataset for Interaction Reasoning in Driving ICML 2025 X-Drive: Cross-modality Consistent Multi-Sensor Data Synthesis for Driving Scenarios ICLR 2025 RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins CVPR 2025 MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models ICLR 2025 CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians CVPR 2025 Tree-Planner: Efficient Close-loop Task Planning with Large Language Models ICLR 2024 UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling ICLR 2024 Human-oriented Representation Learning for Robotic Manipulation RSS 2024 SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution CVPR 2024 MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts NIPS 2024 Interfacing Foundation Models' Embeddings NIPS 2024 Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning CORL 2024 Q-SLAM: Quadric Representations for Monocular SLAM CORL 2024 RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis ICML 2024 VDT: General-purpose Video Diffusion Transformers via Mask Modeling ICLR 2024 TextPSG: Panoptic Scene Graph Generation from Textual Descriptions ICCV 2023 Towards Free Data Selection with General-Purpose Models NIPS 2023 EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought NIPS 2023 Doubly-Robust Self-Training NIPS 2023 Physion++: Evaluating Physical Scene Understanding that Requires Online Inference of Different Physical Properties NIPS 2023 Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners CVPR 2023 Visual Dependency Transformers: Dependency Tree Emerges From Reversed Attention CVPR 2023 EC2: Emergent Communication for Embodied Control CVPR 2023 Planning with Large Language Models for Code Generation ICLR 2023 AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners ICML 2023 Embodied Concept Learner: Self-supervised Learning of Concepts and Mapping through Instruction Following CORL 2022 ComPhy: Compositional Physical Reasoning of Objects and Events from Videos ICLR 2022 Learning Versatile Neural Architectures by Propagating Network Codes ICLR 2022 LGDN: Language-Guided Denoising Network for Video-Language Modeling NIPS 2022 DaViT: Dual Attention Vision Transformers ECCV 2022 CtrlFormer: Learning Transferable State Representation for Visual Control via Transformer ICML 2022 Domain-Adaptive Few-Shot Learning WACV 2021 A Global Occlusion-Aware Approach to Self-Supervised Monocular Visual Odometry AAAI 2021 Compressed Video Contrastive Learning NIPS 2021 Self-Supervised Video Representation Learning with Constrained Spatiotemporal Jigsaw IJCAI 2021 IEPT: Instance-Level and Episode-Level Pretext Tasks for Few-Shot Learning ICLR 2021 Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language NIPS 2021 L2M-GAN: Learning To Manipulate Latent Space Semantics for Facial Attribute Editing CVPR 2021 HR-NAS: Searching Efficient High-Resolution Neural Architectures With Lightweight Transformers CVPR 2021 Segmenting Transparent Objects in the Wild ECCV 2020 Dense Hybrid Recurrent Multi-view Stereo Net with Dynamic Consistency Checking ECCV 2020 Learning Depth-Guided Convolutions for Monocular 3D Object Detection CVPR 2020 Every Frame Counts: Joint Learning of Video Segmentation and Optical Flow AAAI 2020 Pyramid Multi-view Stereo Net with Self-adaptive View Aggregation ECCV 2020 CamNet: Coarse-to-Fine Retrieval for Camera Re-Localization ICCV 2019 Face-Focused Cross-Stream Network for Deception Detection in Videos CVPR 2019 Domain-Invariant Projection Learning for Zero-Shot Recognition NIPS 2018