Dhruv Batra
122 papers · 2011–2024 · 15 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+19 more ↓ Show less ↑
πΊοΈ Taxonomy Completionist (17) π§ Keyword Pioneer π Interdisciplinary Bridge π Renaissance Researcher (7) π£ Hot Topic Early Bird
π
Academic Marathon
(13)
π
Renaissance Researcher
(7)
π
Interdisciplinary Bridge
π
Conference Loyalist
(31)
π
Keyword Trendsetter Combo
(12)
π€
Dynamic Duo
(58)
π
Triple Crown
π§¬
Topic Evolution
π
Grand Slam
π₯
Mega-Team
(85)
π¬
Deep Specialist
(24)
π
Keyword Champion
(7)
π₯
Unstoppable
(14)
β
The Questioner
(5)
β‘
Prolific Year
(13)
π
Century Club
(122)
ποΈ
Keyword Collector
(394)
π
Trend Setter
π
Conference Pioneer
Conferences
CVPR (31)
ICCV (18)
NIPS (16)
EMNLP (12)
CORL (9)
ECCV (9)
ICLR (7)
ICML (6)
NAACL (4)
AISTATS (3)
AAAI (2)
IJCAI (2)
ACL (1)
IJCNLP (1)
RSS (1)
Top co-authors
Keywords
reinforcement learning
(18)
visual question answering
(16)
embodied ai
(13)
multimodal learning
(9)
scene understanding
(9)
dialogue system
(8)
visual navigation
(7)
image captioning
(7)
visual dialog
(7)
robot navigation
(7)
object detection
(6)
embodied question answering
(6)
object navigation
(5)
imitation learning
(5)
structured prediction
(5)
embodied navigation
(5)
object goal navigation
(4)
visual grounding
(4)
transfer learning
(4)
domain adaptation
(4)
Papers
Habitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene Scale and Realism Tradeoffs for ObjectGoal Navigation
CVPR 2024
OpenEQA: Embodied Question Answering in the Era of Foundation Models
CVPR 2024
Habitat 3.0: A Co-Habitat for Humans, Avatars, and Robots
ICLR 2024
Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control
NIPS 2024
GOAT: GO to Any Thing
RSS 2024
GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation
CVPR 2024
Seeing the Unseen: Visual Common Sense for Semantic Placement
CVPR 2024
Adaptive Coordination in Social Embodied Rearrangement
ICML 2023
Skill Transformer: A Monolithic Policy for Mobile Manipulation
ICCV 2023
BC-IRL: Learning Generalizable Reward Functions from Demonstrations
ICLR 2023
Emergence of Maps in the Memories of Blind Navigation Agents
ICLR 2023
Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?
NIPS 2023
Habitat-Matterport 3D Semantics Dataset
CVPR 2023
FindThis: Language-Driven Object Disambiguation in Indoor Environments
CORL 2023
HomeRobot: Open-Vocabulary Mobile Manipulation
CORL 2023
Galactic: Scaling End-to-End Reinforcement Learning for Rearrangement at 100k Steps-per-Second
CVPR 2023
Simple and Effective Synthesis of Indoor 3D Scenes
AAAI 2023
PIRLNav: Pretraining With Imitation and RL Finetuning for ObjectNav
CVPR 2023
Navigating to Objects Specified by Images
ICCV 2023
Is Mapping Necessary for Realistic PointGoal Navigation?
CVPR 2022
Habitat-Web: Learning Embodied Object-Search Strategies From Human Demonstrations at Scale
CVPR 2022
VER: Scaling On-Policy RL Leads to the Emergence of Navigation in Embodied Rearrangement
NIPS 2022
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
NIPS 2022
ZSON: Zero-Shot Object-Goal Navigation using Multimodal Goal Embeddings
NIPS 2022
Housekeep: Tidying Virtual Households Using Commonsense Reasoning
ECCV 2022
Cross-Domain Transfer via Semantic Skill Imitation
CORL 2022
Rethinking Sim2Real: Lower Fidelity Simulation Leads to Higher Sim2Real Transfer in Navigation
CORL 2022
Ego4D: Around the World in 3,000 Hours of Egocentric Video
CVPR 2022
Episodic Memory Question Answering
CVPR 2022
SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation
NIPS 2021
Habitat 2.0: Training Home Assistants to Rearrange their Habitat
NIPS 2021
Auxiliary Tasks and Exploration Enable ObjectGoal Navigation
ICCV 2021
Large Batch Simulation for Deep Reinforcement Learning
ICLR 2021
The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation
ICCV 2021
Contrast and Classify: Training Robust VQA Models
ICCV 2021
THDA: Treasure Hunt Data Augmentation for Semantic Navigation
ICCV 2021
Waypoint Models for Instruction-Guided Navigation in Continuous Environments
ICCV 2021
Semantic MapNet: Building Allocentric Semantic Maps and Representations from Egocentric Views
AAAI 2021
SOrT-ing VQA Models : Contrastive Gradient Learning for Improved Consistency
NAACL 2021
Integrating Egocentric Localization for More Realistic Point-Goal Navigation Agents
CORL 2020
Where Are You? Localization from Embodied Dialog
EMNLP 2020
Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data
NIPS 2020
Auxiliary Tasks Speed Up Learning Point Goal Navigation
CORL 2020
Sim-to-Real Transfer for Vision-and-Language Navigation
CORL 2020
IR-VIC: Unsupervised Discovery of Sub-goals for Transfer in RL
IJCAI 2020
DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames
ICLR 2020
Embodied Multimodal Multitask Learning
IJCAI 2020
Improving Vision-and-Language Navigation with Image-Text Pairs from the Web
ECCV 2020
Spatially Aware Multimodal Transformers for TextVQA
ECCV 2020
Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline
ECCV 2020
Seeing the Un-Scene: Learning Amodal Semantic Maps for Room Navigation
ECCV 2020
Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments
ECCV 2020
Modeling the Long Term Future in Model-Based Reinforcement Learning
ICLR 2019
TarMAC: Targeted Multi-Agent Communication
ICML 2019
Counterfactual Visual Explanations
ICML 2019
Trainable Decoding of Sets of Sequences for Neural Sequence Models
ICML 2019
Probabilistic Neural Symbolic Models for Interpretable Visual Question Answering
ICML 2019
CoDraw: Collaborative Drawing as a Testbed for Grounded Goal-driven Communication
ACL 2019
Habitat: A Platform for Embodied AI Research
ICCV 2019
Improving Generative Visual Dialog by Answering Diverse Questions
EMNLP 2019
SplitNet: Sim2Sim and Task2Task Transfer for Embodied Visual Navigation
ICCV 2019
Embodied Amodal Recognition: Learning to Move to Perceive Objects
ICCV 2019
Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded
ICCV 2019
Chasing Ghosts: Instruction Following as Bayesian State Tracking
NIPS 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
NIPS 2019
Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning
ICCV 2019
nocaps: novel object captioning at scale
ICCV 2019
Improving Generative Visual Dialog by Answering Diverse Questions
IJCNLP 2019
Multi-Target Embodied Question Answering
CVPR 2019
Embodied Question Answering in Photorealistic Environments With Point Cloud Perception
CVPR 2019
Audio Visual Scene-Aware Dialog
CVPR 2019
Towards VQA Models That Can Read
CVPR 2019
CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog
NAACL 2019
Neural Baby Talk
CVPR 2018
Neural Modular Control for Embodied Question Answering
CORL 2018
Visual Curiosity: Learning to Ask Questions to Learn Visual Recognition
CORL 2018
Embodied Question Answering
CVPR 2018
Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
CVPR 2018
Visual Coreference Resolution in Visual Dialog using Neural Module Networks
ECCV 2018
Choose Your Neuron: Incorporating Domain Knowledge through Neuron-Importance
ECCV 2018
Graph R-CNN for Scene Graph Generation
ECCV 2018
Neural-Guided Deductive Search for Real-Time Program Synthesis from Examples
ICLR 2018
Learn from Your Neighbor: Learning Multi-modal Mappings from Sparse Annotations
ICML 2018
Deal or No Deal? End-to-End Learning of Negotiation Dialogues
EMNLP 2017
Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model
NIPS 2017
Visual Dialog
CVPR 2017
Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence Models for Fill-In-The-Blank Image Captioning
CVPR 2017
Making the v in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
CVPR 2017
Counting Everyday Objects in Everyday Scenes
CVPR 2017
The Promise of Premise: Harnessing Question Premises in Visual Question Answering
EMNLP 2017
Natural Language Does Not Emerge βNaturallyβ in Multi-Agent Dialog
EMNLP 2017
ParlAI: A Dialog Research Software Platform
EMNLP 2017
Grad-CAM: Visual Explanations From Deep Networks via Gradient-Based Localization
ICCV 2017
Learning Cooperative Visual Dialog Agents With Deep Reinforcement Learning
ICCV 2017
Joint Unsupervised Learning of Deep Representations and Image Clusters
CVPR 2016
Analyzing the Behavior of Visual Question Answering Models
EMNLP 2016
Resolving Language and Vision Ambiguities Together: Joint Segmentation & Prepositional Attachment Resolution in Captioned Scenes
EMNLP 2016
Human Attention in Visual Question Answering: Do Humans and Deep Networks look at the same regions?
EMNLP 2016
Sort Story: Sorting Jumbled Images and Captions into Stories
EMNLP 2016
Question Relevance in VQA: Identifying Non-Visual And False-Premise Questions
EMNLP 2016
Yin and Yang: Balancing and Answering Binary Visual Questions
CVPR 2016
We Are Humor Beings: Understanding and Predicting Visual Humor
CVPR 2016
Object-Proposal Evaluation Protocol is 'Gameable'
CVPR 2016
Hierarchical Question-Image Co-Attention for Visual Question Answering
NIPS 2016
A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories
NAACL 2016
Visual Storytelling
NAACL 2016
Stochastic Multiple Choice Learning for Training Diverse Deep Ensembles
NIPS 2016
Active Learning for Structured Probabilistic Models With Histogram Approximation
CVPR 2015
SubmodBoxes: Near-Optimal Search for a Set of Diverse Object Proposals
NIPS 2015
VQA: Visual Question Answering
ICCV 2015
Optimizing Expected Intersection-Over-Union With Candidate-Constrained CRFs
ICCV 2015
VIP: Finding Important People in Images
CVPR 2015
Efficiently Enforcing Diversity in Multi-Output Structured Prediction
AISTATS 2014
Empirical Minimum Bayes Risk Prediction: How to Extract an Extra Few % Performance from Vision Models with Just Three More Parameters
CVPR 2014
Submodular meets Structured: Finding Diverse Subsets in Exponentially-Large Structured Item Sets
NIPS 2014
Multimodal Learning in Loosely-organized Web Images
CVPR 2014
Discriminative Re-ranking of Diverse Segmentations
CVPR 2013
DivMCuts: Faster Training of Structural SVMs with Diverse M-Best Cutting-Planes
AISTATS 2013
A Systematic Exploration of Diversity in Machine Translation
EMNLP 2013
Group Norm for Learning Structured SVMs with Unstructured Latent Variables
ICCV 2013
Multiple Choice Learning: Learning to Produce Multiple Structured Outputs
NIPS 2012
Tighter Relaxations for MAP-MRF Inference: A Local Primal-Dual Gap based Separation Algorithm
AISTATS 2011