Jitendra Malik
144 papers · 2006–2025 · 11 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+18 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (13) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (6) π£ Hot Topic Early Bird
π
Renaissance Researcher
(6)
π
Interdisciplinary Bridge
π§
Keyword Pioneer
π
Keyword Trendsetter Combo
(7)
π
Conference Loyalist
(25)
π₯
Mega-Team
(100)
π
Triple Crown
π±
Topic Pioneer
π¬
Deep Specialist
(32)
π€
Dynamic Duo
(17)
π
Keyword Champion
(2)
ποΈ
Keyword Collector
(551)
π
Trend Setter
π
Conference Pioneer
π
Century Club
(144)
β‘
Prolific Year
(16)
π₯
Unstoppable
(13)
β
The Questioner
(4)
Conferences
CVPR (56)
ICCV (25)
NIPS (17)
CORL (15)
ICML (10)
ECCV (7)
ICLR (6)
RSS (5)
EMNLP (1)
IJCAI (1)
NAACL (1)
Top co-authors
Research topics
Keywords
3d reconstruction
(19)
object detection
(13)
pose estimation
(12)
video understanding
(11)
convolutional neural network
(10)
representation learning
(10)
depth estimation
(9)
reinforcement learning
(9)
transformer architecture
(8)
vision transformer
(8)
self-supervised learning
(8)
semantic segmentation
(7)
human pose estimation
(7)
neural network
(6)
transfer learning
(6)
scene understanding
(5)
instance segmentation
(5)
image classification
(5)
action recognition
(5)
3d pose estimation
(5)
Papers
Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids
CORL 2025
Estimating Body and Hand Motion in an Ego-sensed World
CVPR 2025
Scaling Properties of Diffusion Models For Perceptual Tasks
CVPR 2025
Poly-Autoregressive Prediction for Modeling Interactions
CVPR 2025
OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction
ICML 2025
AutoEval Done Right: Using Synthetic Data for Model Evaluation
ICML 2025
RoboVerse: A Unified Platform, Benchmark and Dataset for Scalable and Generalizable Robot Learning
RSS 2025
Reconstructing People, Places, and Cameras
CVPR 2025
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
EMNLP 2025
An Empirical Study of Autoregressive Pre-training from Videos
ICCV 2025
The Sound of Simulation: Learning Multimodal Sim-to-Real Robot Policies with Generative Audio
CORL 2025
Visual Imitation Enables Contextual Humanoid Control
CORL 2025
DexterityGen: Foundation Controller for Unprecedented Dexterity
RSS 2025
Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
CVPR 2024
Reconstructing Hands in 3D with Transformers
CVPR 2024
Twisting Lids Off with Two Hands
CORL 2024
xT: Nested Tokenization for Larger Context in Large Images
ICML 2024
Sequential Modeling Enables Scalable Learning for Large Vision Models
CVPR 2024
Habitat 3.0: A Co-Habitat for Humans, Avatars, and Robots
ICLR 2024
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
RSS 2024
Lessons from Learning to Spin βPensβ
CORL 2024
Dr2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning
CVPR 2024
GOAT: GO to Any Thing
RSS 2024
Humanoid Locomotion as Next Token Prediction
NIPS 2024
Re-evaluating the Need for Visual Signals in Unsupervised Grammar Induction
NAACL 2024
Adaptive Human Trajectory Prediction via Latent Corridors
ECCV 2024
What Matters to You? Towards Visual Representation Alignment for Robot Learning
ICLR 2024
On the Benefits of 3D Pose and Tracking for Human Action Recognition
CVPR 2023
Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?
NIPS 2023
MAViL: Masked Audio-Video Learners
NIPS 2023
Speculative Decoding with Big Little Decoder
NIPS 2023
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding
NIPS 2023
Robot Learning with Sensorimotor Pre-training
CORL 2023
General In-hand Object Rotation with Vision and Touch
CORL 2023
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
ICML 2023
Multi-skill Mobile Manipulation for Object Rearrangement
ICLR 2023
Navigating to Objects Specified by Images
ICCV 2023
Humans in 4D: Reconstructing and Tracking Humans with Transformers
ICCV 2023
Multiview Compressive Coding for 3D Reconstruction
CVPR 2023
Decoupling Human and Camera Motion From Videos in the Wild
CVPR 2023
PONI: Potential Functions for ObjectGoal Navigation With Interaction-Free Learning
CVPR 2022
Reversible Vision Transformers
CVPR 2022
ABO: Dataset and Benchmarks for Real-World 3D Object Understanding
CVPR 2022
Human Mesh Recovery From Multiple Shots
CVPR 2022
Tracking People by Predicting 3D Appearance, Location and Pose
CVPR 2022
Ego4D: Around the World in 3,000 Hours of Egocentric Video
CVPR 2022
Differentiable Stereopsis: Meshes From Multiple Views Using Differentiable Rendering
CVPR 2022
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
CVPR 2022
Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity
CVPR 2022
Image-to-Image Regression with Distribution-Free Uncertainty Quantification and Applications in Imaging
ICML 2022
In-Hand Object Rotation via Rapid Motor Adaptation
CORL 2022
Coupling Vision and Proprioception for Navigation of Legged Robots
CVPR 2022
Real-World Robot Learning with Masked Visual Pre-training
CORL 2022
Legged Locomotion in Challenging Terrains using Egocentric Vision
CORL 2022
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition
CVPR 2022
Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
NIPS 2022
Uncertainty Sets for Image Classifiers using Conformal Prediction
ICLR 2021
Reconstructing Hand-Object Interactions in the Wild
ICCV 2021
Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets From 3D Scans
ICCV 2021
Multiscale Vision Transformers
ICCV 2021
From Goals, Waypoints & Paths to Long Term Human Trajectory Forecasting
ICCV 2021
Minimizing Energy Consumption Leads to the Emergence of Gaits in Legged Robots
CORL 2021
Tracking People with 3D Representations
NIPS 2021
Active 3D Shape Reconstruction from Vision and Touch
NIPS 2021
SEAL: Self-supervised Embodied Active Learning using Exploration and 3D Consistency
NIPS 2021
Habitat 2.0: Training Home Assistants to Rearrange their Habitat
NIPS 2021
RMA: Rapid Motor Adaptation for Legged Robots
RSS 2021
Differentiable Spatial Planning using Transformers
ICML 2021
Learning Long-term Visual Dynamics with Region Proposal Interaction Networks
ICLR 2021
It is not the Journey but the Destination: Endpoint Conditioned Trajectory Prediction
ECCV 2020
3D Shape Reconstruction from Vision and Touch
NIPS 2020
Robust Policies via Mid-Level Visual Representations: An Experimental Study in Manipulation and Navigation
CORL 2020
Robust Learning Through Cross-Task Consistency
CVPR 2020
Long-term Human Motion Prediction with Scene Context
ECCV 2020
Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks
ECCV 2020
Perceiving 3D Human-Object Spatial Arrangements from a Single Image in the Wild
ECCV 2020
Shape and Viewpoint without Keypoints
ECCV 2020
Deep Isometric Learning for Visual Recognition
ICML 2020
Which Tasks Should Be Learned Together in Multi-task Learning?
ICML 2020
ShapeMask: Learning to Segment Novel Objects by Refining Shape Priors
ICCV 2019
Approximate Feature Collisions in Neural Nets
NIPS 2019
Learning Individual Styles of Conversational Gesture
CVPR 2019
Learning Independent Object Motion From Unlabelled Stereoscopic Videos
CVPR 2019
Learning 3D Human Dynamics From Video
CVPR 2019
Non-Adversarial Image Synthesis With Generative Latent Nearest Neighbors
CVPR 2019
Taskonomy: Disentangling Task Transfer Learning
IJCAI 2019
Learning Navigation Subroutines from Egocentric Videos
CORL 2019
Learning to Navigate Using Mid-Level Visual Priors
CORL 2019
Mesh R-CNN
ICCV 2019
Habitat: A Platform for Embodied AI Research
ICCV 2019
Diverse Image Synthesis From Semantic Layouts via Conditional IMLE
ICCV 2019
3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera
ICCV 2019
SlowFast Networks for Video Recognition
ICCV 2019
Predicting 3D Human Dynamics From Video
ICCV 2019
Combining Optimal Control and Learning for Visual Navigation in Novel Environments
CORL 2019
From Lifestyle Vlogs to Everyday Interactions
CVPR 2018
End-to-End Recovery of Human Shape and Pose
CVPR 2018
Zero-Shot Visual Imitation
ICLR 2018
Visual Memory for Robust Path Following
NIPS 2018
Factoring Shape, Pose, and Layout From the 2D Image of a 3D Scene
CVPR 2018
Taskonomy: Disentangling Task Transfer Learning
CVPR 2018
Multi-View Consistency as Supervisory Signal for Learning Shape and Pose Prediction
CVPR 2018
AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions
CVPR 2018
Learning Category-Specific Mesh Reconstruction from Image Collections
ECCV 2018
Gibson Env: Real-World Perception for Embodied Agents
CVPR 2018
Feedback Networks
CVPR 2017
Learning Shape Abstractions by Assembling Volumetric Primitives
CVPR 2017
Multi-View Supervision for Single-View Reconstruction via Differentiable Ray Consistency
CVPR 2017
Cognitive Mapping and Planning for Visual Navigation
CVPR 2017
Learning a Multi-View Stereo Machine
NIPS 2017
What Will Happen Next? Forecasting Player Moves in Sports Videos
ICCV 2017
Fast k-Nearest Neighbour Search via Prioritized DCI
ICML 2017
Cross Modal Distillation for Supervision Transfer
CVPR 2016
Human Pose Estimation With Iterative Error Feedback
CVPR 2016
Iterative Instance Segmentation
CVPR 2016
Fast k-Nearest Neighbour Search via Dynamic Continuous Indexing
ICML 2016
Learning to Poke by Poking: Experiential Learning of Intuitive Physics
NIPS 2016
Category-Specific Object Reconstruction From a Single Image
CVPR 2015
Depth From Shading, Defocus, and Correspondence Using Light-Field Angular Coherence
CVPR 2015
Viewpoints and Keypoints
CVPR 2015
Contextual Action Recognition With R*CNN
ICCV 2015
Finding Action Tubes
CVPR 2015
Hypercolumns for Object Segmentation and Fine-Grained Localization
CVPR 2015
Deformable Part Models are Convolutional Neural Networks
CVPR 2015
DeepBox: Learning Objectness With Convolutional Networks
ICCV 2015
Actions and Attributes From Wholes and Parts
ICCV 2015
Aligning 3D Models to RGB-D Images of Cluttered Scenes
CVPR 2015
Learning to Segment Moving Objects in Videos
CVPR 2015
Virtual View Networks for Object Reconstruction
CVPR 2015
Amodal Completion and Size Constancy in Natural Scenes
ICCV 2015
Pose Induction for Novel Object Categories
ICCV 2015
Learning to See by Moving
ICCV 2015
Recurrent Network Models for Human Dynamics
ICCV 2015
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
CVPR 2014
Using k-Poselets for Detecting People and Localizing Their Keypoints
CVPR 2014
Multiscale Combinatorial Grouping
CVPR 2014
Grouping-Based Low-Rank Trajectory Completion and 3D Reconstruction
NIPS 2014
Depth from Combining Defocus and Correspondence Using Light-Field Cameras
ICCV 2013
Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images
CVPR 2013
Intrinsic Scene Properties from a Single RGB-D Image
CVPR 2013
Training Deformable Part Models with Decorrelated Features
ICCV 2013
Volumetric Semantic Segmentation Using Pyramid Context Features
ICCV 2013
Articulated Pose Estimation Using Discriminative Armlet Classifiers
CVPR 2013
Image Retrieval and Classification Using Local Distance Functions
NIPS 2006