Harsh Agrawal

18 papers · 2016–2025 · 9 conferences · across top CS/AI conferences

Achievements

+8 more ↓

🏃 Academic Marathon (9) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌍 Conference Polyglot (9) 🐝 Cross-Pollinator (10)

🗺️ Taxonomy Completionist (32) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🤝 Dynamic Duo (11) ❓ The Questioner 🗃️ Keyword Collector (63) 🔥 Unstoppable (7) 💎 Century Club (18)

Conferences

ICCV (5) CVPR (2) ECCV (2) EMNLP (2) ICLR (2) NIPS (2) AAAI (1) ACL (1) UAI (1)

Top co-authors

Dhruv Batra (11) Devi Parikh (5) Andrew Szot (4) Bogdan Mazoure (3) Peter Anderson (3) Yash Kant (3) Alexander Toshev (3) Yinfei Yang (2) Abhinav Moudgil (2) Zhe Gan (2)

Keywords

object detection (3) embodied ai (2) multimodal large language model (2) data augmentation (2) multimodal learning (2) vision-language navigation (1) image generation (1) reinforcement learning (1) visual question answering (1) image-to-image translation (1) knowledge distillation (1) few-shot learning (1) dialogue generation (1) scene understanding (1) depth estimation (1) image captioning (1) visual grounding (1) transfer learning (1) zero-shot learning (1) contrastive learning (1)

Papers

UINavBench: A Framework for Comprehensive Evaluation of Interactive Digital Agents ICCV 2025 Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms ICLR 2025 From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons CVPR 2025 Large Language Models as Generalizable Policies for Embodied Tasks ICLR 2024 Grounding Multimodal Large Language Models in Actions NIPS 2024 Multimodal Persona Based Generation of Comic Dialogs ACL 2023 Simple and Effective Synthesis of Indoor 3D Scenes AAAI 2023 Housekeep: Tidying Virtual Households Using Commonsense Reasoning ECCV 2022 SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation NIPS 2021 The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation ICCV 2021 Contrast and Classify: Training Robust VQA Models ICCV 2021 Known unknowns: Learning novel concepts using reasoning-by-elimination UAI 2021 Spatially Aware Multimodal Transformers for TextVQA ECCV 2020 Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning ICCV 2019 nocaps: novel object captioning at scale ICCV 2019 Object-Proposal Evaluation Protocol is 'Gameable' CVPR 2016 Sort Story: Sorting Jumbled Images and Captions into Stories EMNLP 2016 Human Attention in Visual Question Answering: Do Humans and Deep Networks look at the same regions? EMNLP 2016