Aniruddha Kembhavi
61 papers · 2016–2025 · 9 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+15 more ↓ Show less ↑
🌍 Conference Polyglot (9) 🏃 Academic Marathon (9) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (11)
🐝
Cross-Pollinator
(11)
🧭
Keyword Pioneer
🏃
Academic Marathon
(9)
🏠
Conference Loyalist
(28)
🤝
Dynamic Duo
(26)
👥
Mega-Team
(50)
🔬
Deep Specialist
(14)
🏆
Keyword Champion
(13)
🚀
Conference Pioneer
🗃️
Keyword Collector
(241)
📈
Trend Setter
⚡
Prolific Year
(13)
🔥
Unstoppable
(10)
❓
The Questioner
(3)
💎
Century Club
(61)
Conferences
CVPR (28)
NIPS (12)
ECCV (5)
ICCV (5)
ICLR (4)
EMNLP (3)
ACL (2)
CORL (1)
IJCNLP (1)
Top co-authors
Research topics
Keywords
embodied ai
(13)
vision-language model
(8)
visual question answering
(8)
multimodal learning
(6)
visual navigation
(6)
zero-shot learning
(6)
reinforcement learning
(6)
scene understanding
(5)
transfer learning
(5)
neural network
(5)
object manipulation
(3)
procedural generation
(3)
embodied agent
(3)
convolutional neural network
(3)
contrastive learning
(3)
image generation
(3)
transformer architecture
(3)
imitation learning
(3)
self-supervised learning
(3)
image captioning
(3)
Papers
ReSpec: Relevance and Specificity Grounded Online Filtering for Learning on Video-Text Data Streams
CVPR 2025
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
CVPR 2025
Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation
CVPR 2025
One Diffusion to Generate Them All
CVPR 2025
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation
ACL 2025
SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World
CVPR 2024
From an Image to a Scene: Learning to Imagine the World from a Million 360° Videos
NIPS 2024
Task Me Anything
NIPS 2024
PoliFormer: Scaling On-Policy RL with Transformers Results in Masterful Navigators
CORL 2024
Promptable Behaviors: Personalizing Multi-Objective Rewards from Human Preferences
CVPR 2024
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action
CVPR 2024
Holodeck: Language Guided Generation of 3D Embodied AI Environments
CVPR 2024
Iterated Learning Improves Compositionality in Large Vision-Language Models
CVPR 2024
Seeing the Unseen: Visual Common Sense for Semantic Placement
CVPR 2024
Selective Visual Representations Improve Convergence and Generalization for Embodied AI
ICLR 2024
SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality
NIPS 2023
OBJECT 3DIT: Language-guided 3D-aware Image Editing
NIPS 2023
Objaverse-XL: A Universe of 10M+ 3D Objects
NIPS 2023
Neural Priming for Sample-Efficient Adaptation
NIPS 2023
Visual Programming: Compositional Visual Reasoning Without Training
CVPR 2023
EXCALIBUR: Encouraging and Evaluating Embodied Exploration
CVPR 2023
Objaverse: A Universe of Annotated 3D Objects
CVPR 2023
Phone2Proc: Bringing Robust Robots Into Our Chaotic World
CVPR 2023
Scene Graph Contrastive Learning for Embodied Navigation
ICCV 2023
I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision
ICCV 2023
SatlasPretrain: A Large-Scale Dataset for Remote Sensing Image Understanding
ICCV 2023
Neural Radiance Field Codebooks
ICLR 2023
UNIFIED-IO: A Unified Model for Vision, Language, and Multi-modal Tasks
ICLR 2023
Simple but Effective: CLIP Embeddings for Embodied AI
CVPR 2022
Ask4Help: Learning to Leverage an Expert for Embodied Tasks
NIPS 2022
🏘️ ProcTHOR: Large-Scale Embodied AI Using Procedural Generation
NIPS 2022
Webly Supervised Concept Expansion for General Purpose Vision Models
ECCV 2022
Object Manipulation via Visual Target Localization
ECCV 2022
Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture
CVPR 2022
What Do Navigation Agents Learn About Their Environment?
CVPR 2022
Visual Room Rearrangement
CVPR 2021
Visual Semantic Role Labeling for Video Understanding
CVPR 2021
ManipulaTHOR: A Framework for Visual Object Manipulation
CVPR 2021
Container: Context Aggregation Networks
NIPS 2021
Iconary: A Pictionary-Based Game for Testing Multimodal Communication with Drawings and Text
EMNLP 2021
GridToPix: Training Embodied Agents With Minimal Supervision
ICCV 2021
RobustNav: Towards Benchmarking Robustness in Embodied Navigation
ICCV 2021
Bridging the Imitation Gap by Adaptive Insubordination
NIPS 2021
PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World
IJCNLP 2021
Learning Generalizable Visual Representations via Interactive Gameplay
ICLR 2021
PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World
ACL 2021
RoboTHOR: An Open Simulation-to-Real Embodied AI Platform
CVPR 2020
Learning About Objects by Learning to Interact with Them
NIPS 2020
What's Hidden in a Randomly Weighted Neural Network?
CVPR 2020
Grounded Situation Recognition
ECCV 2020
A Cordial Sync: Going Beyond Marginal Policies for Multi-Agent Embodied Tasks
ECCV 2020
Supermasks in Superposition
NIPS 2020
X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers
EMNLP 2020
Two Body Problem: Collaborative Visual Task Completion
CVPR 2019
ELASTIC: Improving CNNs With Dynamic Scaling Policies
CVPR 2019
Structured Set Matching Networks for One-Shot Part Labeling
CVPR 2018
Imagine This! Scripts to Compositions to Videos
ECCV 2018
Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
CVPR 2018
IQA: Visual Question Answering in Interactive Environments
CVPR 2018
Are You Smarter Than a Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension
CVPR 2017
Semantic Parsing to Probabilistic Programs for Situated Question Answering
EMNLP 2016