Mingyu Ding
51 papers · 2018–2026 · 11 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+11 more ↓ Show less ↑
π Conference Polyglot (11) π Academic Marathon (7) π Interdisciplinary Bridge π§ Keyword Pioneer π Cross-Pollinator (14)
π
Cross-Pollinator
(14)
π
Renaissance Researcher
(7)
πΊοΈ
Taxonomy Completionist
(71)
π§¬
Topic Evolution
π
Triple Crown
π€
Dynamic Duo
(23)
π
Grand Slam
ποΈ
Keyword Collector
(186)
π
Century Club
(49)
β‘
Prolific Year
(10)
π₯
Unstoppable
(8)
Conferences
CVPR (11)
NIPS (10)
ICLR (9)
AAAI (4)
ECCV (4)
ICML (4)
CORL (3)
ICCV (3)
IJCAI (1)
RSS (1)
WACV (1)
Top co-authors
Keywords
foundation model
(4)
self-supervised learning
(3)
transfer learning
(3)
multi-modal learning
(3)
contrastive learning
(3)
robotic manipulation
(3)
robot manipulation
(2)
video understanding
(2)
diffusion model
(2)
multimodal learning
(2)
adversarial learning
(2)
representation learning
(2)
autonomous driving
(2)
multi-task learning
(2)
object detection
(2)
pose estimation
(2)
unsupervised learning
(2)
image retrieval
(2)
domain adaptation
(2)
depth estimation
(2)
Papers
Unlocking the Power of Large Multimodal Models for Robot Learning: Robustness, Generalization, and Opportunities
AAAI 2026
ARCHE: A Novel Task to Evaluate LLMs on Latent Reasoning Chain Extraction
AAAI 2026
DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation
CVPR 2025
Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos
ICCV 2025
WOMD-Reasoning: A Large-Scale Dataset for Interaction Reasoning in Driving
ICML 2025
X-Drive: Cross-modality Consistent Multi-Sensor Data Synthesis for Driving Scenarios
ICLR 2025
RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins
CVPR 2025
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
ICLR 2025
CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians
CVPR 2025
Tree-Planner: Efficient Close-loop Task Planning with Large Language Models
ICLR 2024
UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling
ICLR 2024
Human-oriented Representation Learning for Robotic Manipulation
RSS 2024
SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution
CVPR 2024
MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts
NIPS 2024
Interfacing Foundation Models' Embeddings
NIPS 2024
Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning
CORL 2024
Q-SLAM: Quadric Representations for Monocular SLAM
CORL 2024
RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis
ICML 2024
VDT: General-purpose Video Diffusion Transformers via Mask Modeling
ICLR 2024
TextPSG: Panoptic Scene Graph Generation from Textual Descriptions
ICCV 2023
Towards Free Data Selection with General-Purpose Models
NIPS 2023
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought
NIPS 2023
Doubly-Robust Self-Training
NIPS 2023
Physion++: Evaluating Physical Scene Understanding that Requires Online Inference of Different Physical Properties
NIPS 2023
Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners
CVPR 2023
Visual Dependency Transformers: Dependency Tree Emerges From Reversed Attention
CVPR 2023
EC2: Emergent Communication for Embodied Control
CVPR 2023
Planning with Large Language Models for Code Generation
ICLR 2023
AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
ICML 2023
Embodied Concept Learner: Self-supervised Learning of Concepts and Mapping through Instruction Following
CORL 2022
ComPhy: Compositional Physical Reasoning of Objects and Events from Videos
ICLR 2022
Learning Versatile Neural Architectures by Propagating Network Codes
ICLR 2022
LGDN: Language-Guided Denoising Network for Video-Language Modeling
NIPS 2022
DaViT: Dual Attention Vision Transformers
ECCV 2022
CtrlFormer: Learning Transferable State Representation for Visual Control via Transformer
ICML 2022
Domain-Adaptive Few-Shot Learning
WACV 2021
A Global Occlusion-Aware Approach to Self-Supervised Monocular Visual Odometry
AAAI 2021
Compressed Video Contrastive Learning
NIPS 2021
Self-Supervised Video Representation Learning with Constrained Spatiotemporal Jigsaw
IJCAI 2021
IEPT: Instance-Level and Episode-Level Pretext Tasks for Few-Shot Learning
ICLR 2021
Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language
NIPS 2021
L2M-GAN: Learning To Manipulate Latent Space Semantics for Facial Attribute Editing
CVPR 2021
HR-NAS: Searching Efficient High-Resolution Neural Architectures With Lightweight Transformers
CVPR 2021
Segmenting Transparent Objects in the Wild
ECCV 2020
Dense Hybrid Recurrent Multi-view Stereo Net with Dynamic Consistency Checking
ECCV 2020
Learning Depth-Guided Convolutions for Monocular 3D Object Detection
CVPR 2020
Every Frame Counts: Joint Learning of Video Segmentation and Optical Flow
AAAI 2020
Pyramid Multi-view Stereo Net with Self-adaptive View Aggregation
ECCV 2020
CamNet: Coarse-to-Fine Retrieval for Camera Re-Localization
ICCV 2019
Face-Focused Cross-Stream Network for Deception Detection in Videos
CVPR 2019
Domain-Invariant Projection Learning for Zero-Shot Recognition
NIPS 2018