Rohit Girdhar
29 papers · 2017–2025 · 6 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+11 more ↓ Show less ↑
π Renaissance Researcher (6) π Interdisciplinary Bridge π Conference Polyglot (6) π Academic Marathon (8) πΊοΈ Taxonomy Completionist (51)
πΊοΈ
Taxonomy Completionist
(51)
π§
Keyword Pioneer
π£
Hot Topic Early Bird
π€
Dynamic Duo
(16)
π§¬
Topic Evolution
π₯
Mega-Team
(85)
π
Keyword Champion
(2)
ποΈ
Keyword Collector
(112)
β‘
Prolific Year
(6)
π
Century Club
(29)
π₯
Unstoppable
(9)
Conferences
CVPR (18)
ICCV (5)
ECCV (2)
ICLR (2)
ICML (1)
NIPS (1)
Top co-authors
Keywords
self-supervised learning
(5)
video understanding
(5)
zero-shot learning
(4)
multimodal learning
(4)
semantic segmentation
(4)
instance segmentation
(4)
object detection
(4)
multi-modal learning
(3)
contrastive learning
(3)
action recognition
(3)
point cloud
(3)
unsupervised learning
(3)
attention mechanism
(3)
computer vision
(2)
transfer learning
(2)
representation learning
(2)
temporal modeling
(2)
audio-visual learning
(2)
transformer architecture
(2)
3d vision
(2)
Papers
MotiF: Making Text Count in Image Animation with Motion Focal Loss
CVPR 2025
LLMs can see and hear without any training
ICML 2025
Generating Illustrated Instructions
CVPR 2024
SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos
CVPR 2024
InstanceDiffusion: Instance-level Control for Image Generation
CVPR 2024
VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation
CVPR 2024
Factorizing Text-to-Video Generation by Explicit Image Conditioning
ECCV 2024
Learning Video Representations From Large Language Models
CVPR 2023
The Effectiveness of MAE Pre-Pretraining for Billion-Scale Pretraining
ICCV 2023
OmniMAE: Single Model Masked Pretraining on Images and Videos
CVPR 2023
ImageBind: One Embedding Space To Bind Them All
CVPR 2023
HierVL: Learning Hierarchical Video-Language Embeddings
CVPR 2023
Cut and Learn for Unsupervised Object Detection and Instance Segmentation
CVPR 2023
Masked-Attention Mask Transformer for Universal Image Segmentation
CVPR 2022
Ego4D: Around the World in 3,000 Hours of Egocentric Video
CVPR 2022
Detecting Twenty-Thousand Classes Using Image-Level Supervision
ECCV 2022
Omnivore: A Single Model for Many Visual Modalities
CVPR 2022
An End-to-End Transformer Model for 3D Object Detection
ICCV 2021
Anticipative Video Transformer
ICCV 2021
3D Spatial Recognition Without Spatially Labeled 3D
CVPR 2021
Self-Supervised Pretraining of 3D Features on Any Point-Cloud
ICCV 2021
MetaPix: Few-Shot Video Retargeting
ICLR 2020
CATER: A diagnostic dataset for Compositional Actions & TEmporal Reasoning
ICLR 2020
Video Action Transformer Network
CVPR 2019
DistInit: Learning Video Representations Without a Single Labeled Video
ICCV 2019
Detect-and-Track: Efficient Pose Estimation in Videos
CVPR 2018
ActionVLAD: Learning Spatio-Temporal Aggregation for Action Classification
CVPR 2017
Binge Watching: Scaling Affordance Learning From Sitcoms
CVPR 2017
Attentional Pooling for Action Recognition
NIPS 2017