Rohit Girdhar

29 papers · 2017–2025 · 6 conferences · across top CS/AI conferences

Achievements

+11 more ↓

🌈 Renaissance Researcher (6) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (6) 🏃 Academic Marathon (8) 🗺️ Taxonomy Completionist (51)

🗺️ Taxonomy Completionist (51) 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🤝 Dynamic Duo (16) 🧬 Topic Evolution 👥 Mega-Team (85) 🏆 Keyword Champion (2) 🗃️ Keyword Collector (112) ⚡ Prolific Year (6) 💎 Century Club (29) 🔥 Unstoppable (9)

Conferences

CVPR (18) ICCV (5) ECCV (2) ICLR (2) ICML (1) NIPS (1)

Top co-authors

Ishan Misra (16) Armand Joulin (7) Deva Ramanan (5) Mannat Singh (5) Lorenzo Torresani (4) Kristen Grauman (4) Kalyan Vasudev Alwala (3) Kumar Ashutosh (3) Xudong Wang (3) Philipp Krähenbühl (2)

Keywords

self-supervised learning (5) video understanding (5) zero-shot learning (4) multimodal learning (4) semantic segmentation (4) instance segmentation (4) object detection (4) multi-modal learning (3) contrastive learning (3) action recognition (3) point cloud (3) unsupervised learning (3) attention mechanism (3) computer vision (2) transfer learning (2) representation learning (2) temporal modeling (2) audio-visual learning (2) transformer architecture (2) 3d vision (2)

Papers

MotiF: Making Text Count in Image Animation with Motion Focal Loss CVPR 2025 LLMs can see and hear without any training ICML 2025 Generating Illustrated Instructions CVPR 2024 SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos CVPR 2024 InstanceDiffusion: Instance-level Control for Image Generation CVPR 2024 VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation CVPR 2024 Factorizing Text-to-Video Generation by Explicit Image Conditioning ECCV 2024 Learning Video Representations From Large Language Models CVPR 2023 The Effectiveness of MAE Pre-Pretraining for Billion-Scale Pretraining ICCV 2023 OmniMAE: Single Model Masked Pretraining on Images and Videos CVPR 2023 ImageBind: One Embedding Space To Bind Them All CVPR 2023 HierVL: Learning Hierarchical Video-Language Embeddings CVPR 2023 Cut and Learn for Unsupervised Object Detection and Instance Segmentation CVPR 2023 Masked-Attention Mask Transformer for Universal Image Segmentation CVPR 2022 Ego4D: Around the World in 3,000 Hours of Egocentric Video CVPR 2022 Detecting Twenty-Thousand Classes Using Image-Level Supervision ECCV 2022 Omnivore: A Single Model for Many Visual Modalities CVPR 2022 An End-to-End Transformer Model for 3D Object Detection ICCV 2021 Anticipative Video Transformer ICCV 2021 3D Spatial Recognition Without Spatially Labeled 3D CVPR 2021 Self-Supervised Pretraining of 3D Features on Any Point-Cloud ICCV 2021 MetaPix: Few-Shot Video Retargeting ICLR 2020 CATER: A diagnostic dataset for Compositional Actions & TEmporal Reasoning ICLR 2020 Video Action Transformer Network CVPR 2019 DistInit: Learning Video Representations Without a Single Labeled Video ICCV 2019 Detect-and-Track: Efficient Pose Estimation in Videos CVPR 2018 ActionVLAD: Learning Spatio-Temporal Aggregation for Action Classification CVPR 2017 Binge Watching: Scaling Affordance Learning From Sitcoms CVPR 2017 Attentional Pooling for Action Recognition NIPS 2017