Jitendra Malik

144 papers · 2006–2025 · 11 conferences · across top CS/AI conferences

Achievements

+18 more ↓

🗺️ Taxonomy Completionist (13) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (6) 🐣 Hot Topic Early Bird

🌈 Renaissance Researcher (6) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌟 Keyword Trendsetter Combo (7) 🏠 Conference Loyalist (25) 👥 Mega-Team (100) 👑 Triple Crown 🌱 Topic Pioneer 🔬 Deep Specialist (32) 🤝 Dynamic Duo (17) 🏆 Keyword Champion (2) 🗃️ Keyword Collector (551) 📈 Trend Setter 🚀 Conference Pioneer 💎 Century Club (144) ⚡ Prolific Year (16) 🔥 Unstoppable (13) ❓ The Questioner (4)

Conferences

CVPR (56) ICCV (25) NIPS (17) CORL (15) ICML (10) ECCV (7) ICLR (6) RSS (5) EMNLP (1) IJCAI (1) NAACL (1)

Top co-authors

Angjoo Kanazawa (17) Karttikeya Mangalam (16) Trevor Darrell (11) Saurabh Gupta (11) Christoph Feichtenhofer (11) Shubham Tulsiani (10) Pablo Arbelaez (10) Pieter Abbeel (9) Georgia Gkioxari (9) Alexander Sax (8)

Research topics

Computer Vision (1) Robotics (1)

Keywords

3d reconstruction (19) object detection (13) pose estimation (12) video understanding (11) convolutional neural network (10) representation learning (10) depth estimation (9) reinforcement learning (9) transformer architecture (8) vision transformer (8) self-supervised learning (8) semantic segmentation (7) human pose estimation (7) neural network (6) transfer learning (6) scene understanding (5) instance segmentation (5) image classification (5) action recognition (5) 3d pose estimation (5)

Papers

Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids CORL 2025 Estimating Body and Hand Motion in an Ego-sensed World CVPR 2025 Scaling Properties of Diffusion Models For Perceptual Tasks CVPR 2025 Poly-Autoregressive Prediction for Modeling Interactions CVPR 2025 OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction ICML 2025 AutoEval Done Right: Using Synthetic Data for Model Evaluation ICML 2025 RoboVerse: A Unified Platform, Benchmark and Dataset for Scalable and Generalizable Robot Learning RSS 2025 Reconstructing People, Places, and Cameras CVPR 2025 AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time EMNLP 2025 An Empirical Study of Autoregressive Pre-training from Videos ICCV 2025 The Sound of Simulation: Learning Multimodal Sim-to-Real Robot Policies with Generative Audio CORL 2025 Visual Imitation Enables Contextual Humanoid Control CORL 2025 DexterityGen: Foundation Controller for Unprecedented Dexterity RSS 2025 Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives CVPR 2024 Reconstructing Hands in 3D with Transformers CVPR 2024 Twisting Lids Off with Two Hands CORL 2024 xT: Nested Tokenization for Larger Context in Large Images ICML 2024 Sequential Modeling Enables Scalable Learning for Large Vision Models CVPR 2024 Habitat 3.0: A Co-Habitat for Humans, Avatars, and Robots ICLR 2024 DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset RSS 2024 Lessons from Learning to Spin “Pens” CORL 2024 Dr2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning CVPR 2024 GOAT: GO to Any Thing RSS 2024 Humanoid Locomotion as Next Token Prediction NIPS 2024 Re-evaluating the Need for Visual Signals in Unsupervised Grammar Induction NAACL 2024 Adaptive Human Trajectory Prediction via Latent Corridors ECCV 2024 What Matters to You? Towards Visual Representation Alignment for Robot Learning ICLR 2024 On the Benefits of 3D Pose and Tracking for Human Action Recognition CVPR 2023 Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence? NIPS 2023 MAViL: Masked Audio-Video Learners NIPS 2023 Speculative Decoding with Big Little Decoder NIPS 2023 EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding NIPS 2023 Robot Learning with Sensorimotor Pre-training CORL 2023 General In-hand Object Rotation with Vision and Touch CORL 2023 Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles ICML 2023 Multi-skill Mobile Manipulation for Object Rearrangement ICLR 2023 Navigating to Objects Specified by Images ICCV 2023 Humans in 4D: Reconstructing and Tracking Humans with Transformers ICCV 2023 Multiview Compressive Coding for 3D Reconstruction CVPR 2023 Decoupling Human and Camera Motion From Videos in the Wild CVPR 2023 PONI: Potential Functions for ObjectGoal Navigation With Interaction-Free Learning CVPR 2022 Reversible Vision Transformers CVPR 2022 ABO: Dataset and Benchmarks for Real-World 3D Object Understanding CVPR 2022 Human Mesh Recovery From Multiple Shots CVPR 2022 Tracking People by Predicting 3D Appearance, Location and Pose CVPR 2022 Ego4D: Around the World in 3,000 Hours of Egocentric Video CVPR 2022 Differentiable Stereopsis: Meshes From Multiple Views Using Differentiable Rendering CVPR 2022 MViTv2: Improved Multiscale Vision Transformers for Classification and Detection CVPR 2022 Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity CVPR 2022 Image-to-Image Regression with Distribution-Free Uncertainty Quantification and Applications in Imaging ICML 2022 In-Hand Object Rotation via Rapid Motor Adaptation CORL 2022 Coupling Vision and Proprioception for Navigation of Legged Robots CVPR 2022 Real-World Robot Learning with Masked Visual Pre-training CORL 2022 Legged Locomotion in Challenging Terrains using Egocentric Vision CORL 2022 MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition CVPR 2022 Squeezeformer: An Efficient Transformer for Automatic Speech Recognition NIPS 2022 Uncertainty Sets for Image Classifiers using Conformal Prediction ICLR 2021 Reconstructing Hand-Object Interactions in the Wild ICCV 2021 Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets From 3D Scans ICCV 2021 Multiscale Vision Transformers ICCV 2021 From Goals, Waypoints & Paths to Long Term Human Trajectory Forecasting ICCV 2021 Minimizing Energy Consumption Leads to the Emergence of Gaits in Legged Robots CORL 2021 Tracking People with 3D Representations NIPS 2021 Active 3D Shape Reconstruction from Vision and Touch NIPS 2021 SEAL: Self-supervised Embodied Active Learning using Exploration and 3D Consistency NIPS 2021 Habitat 2.0: Training Home Assistants to Rearrange their Habitat NIPS 2021 RMA: Rapid Motor Adaptation for Legged Robots RSS 2021 Differentiable Spatial Planning using Transformers ICML 2021 Learning Long-term Visual Dynamics with Region Proposal Interaction Networks ICLR 2021 It is not the Journey but the Destination: Endpoint Conditioned Trajectory Prediction ECCV 2020 3D Shape Reconstruction from Vision and Touch NIPS 2020 Robust Policies via Mid-Level Visual Representations: An Experimental Study in Manipulation and Navigation CORL 2020 Robust Learning Through Cross-Task Consistency CVPR 2020 Long-term Human Motion Prediction with Scene Context ECCV 2020 Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks ECCV 2020 Perceiving 3D Human-Object Spatial Arrangements from a Single Image in the Wild ECCV 2020 Shape and Viewpoint without Keypoints ECCV 2020 Deep Isometric Learning for Visual Recognition ICML 2020 Which Tasks Should Be Learned Together in Multi-task Learning? ICML 2020 ShapeMask: Learning to Segment Novel Objects by Refining Shape Priors ICCV 2019 Approximate Feature Collisions in Neural Nets NIPS 2019 Learning Individual Styles of Conversational Gesture CVPR 2019 Learning Independent Object Motion From Unlabelled Stereoscopic Videos CVPR 2019 Learning 3D Human Dynamics From Video CVPR 2019 Non-Adversarial Image Synthesis With Generative Latent Nearest Neighbors CVPR 2019 Taskonomy: Disentangling Task Transfer Learning IJCAI 2019 Learning Navigation Subroutines from Egocentric Videos CORL 2019 Learning to Navigate Using Mid-Level Visual Priors CORL 2019 Mesh R-CNN ICCV 2019 Habitat: A Platform for Embodied AI Research ICCV 2019 Diverse Image Synthesis From Semantic Layouts via Conditional IMLE ICCV 2019 3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera ICCV 2019 SlowFast Networks for Video Recognition ICCV 2019 Predicting 3D Human Dynamics From Video ICCV 2019 Combining Optimal Control and Learning for Visual Navigation in Novel Environments CORL 2019 From Lifestyle Vlogs to Everyday Interactions CVPR 2018 End-to-End Recovery of Human Shape and Pose CVPR 2018 Zero-Shot Visual Imitation ICLR 2018 Visual Memory for Robust Path Following NIPS 2018 Factoring Shape, Pose, and Layout From the 2D Image of a 3D Scene CVPR 2018 Taskonomy: Disentangling Task Transfer Learning CVPR 2018 Multi-View Consistency as Supervisory Signal for Learning Shape and Pose Prediction CVPR 2018 AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions CVPR 2018 Learning Category-Specific Mesh Reconstruction from Image Collections ECCV 2018 Gibson Env: Real-World Perception for Embodied Agents CVPR 2018 Feedback Networks CVPR 2017 Learning Shape Abstractions by Assembling Volumetric Primitives CVPR 2017 Multi-View Supervision for Single-View Reconstruction via Differentiable Ray Consistency CVPR 2017 Cognitive Mapping and Planning for Visual Navigation CVPR 2017 Learning a Multi-View Stereo Machine NIPS 2017 What Will Happen Next? Forecasting Player Moves in Sports Videos ICCV 2017 Fast k-Nearest Neighbour Search via Prioritized DCI ICML 2017 Cross Modal Distillation for Supervision Transfer CVPR 2016 Human Pose Estimation With Iterative Error Feedback CVPR 2016 Iterative Instance Segmentation CVPR 2016 Fast k-Nearest Neighbour Search via Dynamic Continuous Indexing ICML 2016 Learning to Poke by Poking: Experiential Learning of Intuitive Physics NIPS 2016 Category-Specific Object Reconstruction From a Single Image CVPR 2015 Depth From Shading, Defocus, and Correspondence Using Light-Field Angular Coherence CVPR 2015 Viewpoints and Keypoints CVPR 2015 Contextual Action Recognition With R*CNN ICCV 2015 Finding Action Tubes CVPR 2015 Hypercolumns for Object Segmentation and Fine-Grained Localization CVPR 2015 Deformable Part Models are Convolutional Neural Networks CVPR 2015 DeepBox: Learning Objectness With Convolutional Networks ICCV 2015 Actions and Attributes From Wholes and Parts ICCV 2015 Aligning 3D Models to RGB-D Images of Cluttered Scenes CVPR 2015 Learning to Segment Moving Objects in Videos CVPR 2015 Virtual View Networks for Object Reconstruction CVPR 2015 Amodal Completion and Size Constancy in Natural Scenes ICCV 2015 Pose Induction for Novel Object Categories ICCV 2015 Learning to See by Moving ICCV 2015 Recurrent Network Models for Human Dynamics ICCV 2015 Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation CVPR 2014 Using k-Poselets for Detecting People and Localizing Their Keypoints CVPR 2014 Multiscale Combinatorial Grouping CVPR 2014 Grouping-Based Low-Rank Trajectory Completion and 3D Reconstruction NIPS 2014 Depth from Combining Defocus and Correspondence Using Light-Field Cameras ICCV 2013 Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images CVPR 2013 Intrinsic Scene Properties from a Single RGB-D Image CVPR 2013 Training Deformable Part Models with Decorrelated Features ICCV 2013 Volumetric Semantic Segmentation Using Pyramid Context Features ICCV 2013 Articulated Pose Estimation Using Discriminative Armlet Classifiers CVPR 2013 Image Retrieval and Classification Using Local Distance Functions NIPS 2006