Tanmay Gupta

19 papers · 2015–2025 · 7 conferences · across top CS/AI conferences

Achievements

+11 more ↓

🏃 Academic Marathon (10) 🌍 Conference Polyglot (7) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (13)

🌈 Renaissance Researcher (7) 🌍 Conference Polyglot (7) 🏃 Academic Marathon (10) 🤝 Dynamic Duo (10) 👥 Mega-Team (50) 🧬 Topic Evolution 💎 Century Club (19) ⚡ Prolific Year (5) 🚀 Conference Pioneer 🗃️ Keyword Collector (74) 🔥 Unstoppable (9)

Conferences

CVPR (6) ECCV (4) ICCV (3) ACL (2) NIPS (2) ICML (1) NAACL (1)

Top co-authors

Aniruddha Kembhavi (10) Derek Hoiem (10) Ranjay Krishna (6) Ali Farhadi (3) Christopher Clark (3) Luca Weihs (3) Mark Yatskar (3) Eli VanderBilt (3) Oscar Michel (3) Michal Shlapentokh-Rothman (2)

Keywords

zero-shot learning (4) vision-language model (4) multimodal learning (3) visual question answering (3) large language model (3) data augmentation (2) in-context learning (2) visual reasoning (2) multi-task learning (1) pose estimation (1) transformer architecture (1) object detection (1) transfer learning (1) imitation learning (1) reinforcement learning (1) code generation (1) image captioning (1) model selection (1) video understanding (1) action recognition (1)

Papers

Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation ACL 2025 Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models CVPR 2025 Selective “Selective Prediction”: Reducing Unnecessary Abstention in Vision-Language Reasoning ACL 2024 WebWISE: Unlocking Web Interface Control for LLMs via Sequential Exploration NAACL 2024 Task Me Anything NIPS 2024 SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World CVPR 2024 m&m’s: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks ECCV 2024 Visual Programming: Compositional Visual Reasoning Without Training CVPR 2023 OBJECT 3DIT: Language-guided 3D-aware Image Editing NIPS 2023 Webly Supervised Concept Expansion for General Purpose Vision Models ECCV 2022 Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture CVPR 2022 Visual Semantic Role Labeling for Video Understanding CVPR 2021 Learning Curves for Analysis of Deep Networks ICML 2021 Contrastive Learning for Weakly Supervised Phrase Grounding ECCV 2020 No-Frills Human-Object Interaction Detection: Factorization, Layout Encodings, and Training Techniques ICCV 2019 ViCo: Word Embeddings From Visual Co-Occurrences ICCV 2019 Imagine This! Scripts to Compositions to Videos ECCV 2018 Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks ICCV 2017 Completing 3D Object Shape From One Depth Image CVPR 2015