Harsh Agrawal
18 papers · 2016–2025 · 9 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+8 more ↓ Show less ↑
π Academic Marathon (9) π Interdisciplinary Bridge π§ Keyword Pioneer π Conference Polyglot (9) π Cross-Pollinator (10)
πΊοΈ
Taxonomy Completionist
(32)
π
Interdisciplinary Bridge
π§
Keyword Pioneer
π€
Dynamic Duo
(11)
β
The Questioner
ποΈ
Keyword Collector
(63)
π₯
Unstoppable
(7)
π
Century Club
(18)
Conferences
ICCV (5)
CVPR (2)
ECCV (2)
EMNLP (2)
ICLR (2)
NIPS (2)
AAAI (1)
ACL (1)
UAI (1)
Top co-authors
Keywords
object detection
(3)
embodied ai
(2)
multimodal large language model
(2)
data augmentation
(2)
multimodal learning
(2)
vision-language navigation
(1)
image generation
(1)
reinforcement learning
(1)
visual question answering
(1)
image-to-image translation
(1)
knowledge distillation
(1)
few-shot learning
(1)
dialogue generation
(1)
scene understanding
(1)
depth estimation
(1)
image captioning
(1)
visual grounding
(1)
transfer learning
(1)
zero-shot learning
(1)
contrastive learning
(1)
Papers
UINavBench: A Framework for Comprehensive Evaluation of Interactive Digital Agents
ICCV 2025
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms
ICLR 2025
From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons
CVPR 2025
Large Language Models as Generalizable Policies for Embodied Tasks
ICLR 2024
Grounding Multimodal Large Language Models in Actions
NIPS 2024
Multimodal Persona Based Generation of Comic Dialogs
ACL 2023
Simple and Effective Synthesis of Indoor 3D Scenes
AAAI 2023
Housekeep: Tidying Virtual Households Using Commonsense Reasoning
ECCV 2022
SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation
NIPS 2021
The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation
ICCV 2021
Contrast and Classify: Training Robust VQA Models
ICCV 2021
Known unknowns: Learning novel concepts using reasoning-by-elimination
UAI 2021
Spatially Aware Multimodal Transformers for TextVQA
ECCV 2020
Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning
ICCV 2019
nocaps: novel object captioning at scale
ICCV 2019
Object-Proposal Evaluation Protocol is 'Gameable'
CVPR 2016
Sort Story: Sorting Jumbled Images and Captions into Stories
EMNLP 2016
Human Attention in Visual Question Answering: Do Humans and Deep Networks look at the same regions?
EMNLP 2016