Ishan Misra

50 papers · 2015–2025 · 8 conferences · across top CS/AI conferences

Achievements

+12 more ↓

🌉 Interdisciplinary Bridge 🏃 Academic Marathon (10) 🌈 Renaissance Researcher (5) 🌍 Conference Polyglot (8) 🗺️ Taxonomy Completionist (69)

🏃 Academic Marathon (10) 🧭 Keyword Pioneer 🌈 Renaissance Researcher (5) 🏠 Conference Loyalist (23) 🔬 Deep Specialist (14) 🤝 Dynamic Duo (16) 🧬 Topic Evolution 📈 Trend Setter 🔥 Unstoppable (11) 🗃️ Keyword Collector (166) 💎 Century Club (50) ⚡ Prolific Year (5)

Conferences

CVPR (23) ICCV (12) ECCV (3) ICLR (3) ICML (3) NIPS (3) NAACL (2) ACL (1)

Top co-authors

Rohit Girdhar (16) Armand Joulin (11) Piotr Bojanowski (6) Mannat Singh (6) Abhinav Gupta (6) Martial Hebert (5) Ross Girshick (4) Margaret Mitchell (4) Nicolas Ballas (4) Yann LeCun (4)

Keywords

self-supervised learning (15) representation learning (13) object detection (6) semantic segmentation (5) image classification (5) contrastive learning (5) zero-shot learning (4) transfer learning (4) instance segmentation (4) unsupervised learning (4) diffusion model (3) vision transformer (3) point cloud (3) action recognition (3) convolutional neural network (3) multimodal learning (3) video understanding (3) audio-visual learning (3) multi-modal learning (3) transformer architecture (2)

Papers

LLMs can see and hear without any training ICML 2025 Generating Multi-Image Synthetic Data for Text-to-Image Customization ICCV 2025 FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis CVPR 2024 InstanceDiffusion: Instance-level Control for Image Generation CVPR 2024 VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation CVPR 2024 Factorizing Text-to-Video Generation by Explicit Image Conditioning ECCV 2024 Generating Illustrated Instructions CVPR 2024 OmniMAE: Single Model Masked Pretraining on Images and Videos CVPR 2023 ImageBind: One Embedding Space To Bind Them All CVPR 2023 Cut and Learn for Unsupervised Object Detection and Instance Segmentation CVPR 2023 Learning Video Representations From Large Language Models CVPR 2023 RoPAWS: Robust Semi-supervised Representation Learning from Uncurated Data ICLR 2023 The hidden uniform cluster prior in self-supervised learning ICLR 2023 The Effectiveness of MAE Pre-Pretraining for Billion-Scale Pretraining ICCV 2023 MonoNeRF: Learning Generalizable NeRFs from Monocular Videos without Camera Poses ICML 2023 Self-Supervised Learning From Images With a Joint-Embedding Predictive Architecture CVPR 2023 MOST: Multiple Object Localization with Self-Supervised Transformers for Object Discovery ICCV 2023 GeneCIS: A Benchmark for General Conditional Image Similarity CVPR 2023 Omnivore: A Single Model for Many Visual Modalities CVPR 2022 Masked Siamese Networks for Label-Efficient Learning ECCV 2022 A Data-Augmentation Is Worth A Thousand Samples: Analytical Moments And Sampling-Free Training NIPS 2022 Detecting Twenty-Thousand Classes Using Image-Level Supervision ECCV 2022 Frame Averaging for Invariant and Equivariant Network Design ICLR 2022 Masked-Attention Mask Transformer for Universal Image Segmentation CVPR 2022 Emerging Properties in Self-Supervised Vision Transformers ICCV 2021 Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers NIPS 2021 Audio-Visual Instance Discrimination with Cross-Modal Agreement CVPR 2021 Robust Audio-Visual Instance Discrimination CVPR 2021 3D Spatial Recognition Without Spatially Labeled 3D CVPR 2021 An End-to-End Transformer Model for 3D Object Detection ICCV 2021 Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments With Support Samples ICCV 2021 MDETR - Modulated Detection for End-to-End Multi-Modal Understanding ICCV 2021 Self-Supervised Pretraining of 3D Features on Any Point-Cloud ICCV 2021 Space-Time Crop & Attend: Improving Cross-Modal Video Representation Learning ICCV 2021 Barlow Twins: Self-Supervised Learning via Redundancy Reduction ICML 2021 In Defense of Grid Features for Visual Question Answering CVPR 2020 Unsupervised Learning of Visual Features by Contrasting Cluster Assignments NIPS 2020 Self-Supervised Learning of Pretext-Invariant Representations CVPR 2020 ClusterFit: Improving Generalization of Visual Representations CVPR 2020 3D-RelNet: Joint Object and Relational Network for 3D Prediction ICCV 2019 Scaling and Benchmarking Self-Supervised Visual Representation Learning ICCV 2019 Learning by Asking Questions CVPR 2018 Proceedings of the First Workshop on Storytelling NAACL 2018 From Red Wine to Red Tomato: Composition With Context CVPR 2017 Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection ICCV 2017 Generating Natural Questions About an Image ACL 2016 Visual Storytelling NAACL 2016 Seeing Through the Human Reporting Bias: Visual Classifiers From Noisy Human-Centric Labels CVPR 2016 Cross-Stitch Networks for Multi-Task Learning CVPR 2016 Watch and Learn: Semi-Supervised Learning for Object Detectors From Video CVPR 2015