Tanmay Gupta
19 papers · 2015–2025 · 7 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+11 more ↓ Show less ↑
π Academic Marathon (10) π Conference Polyglot (7) π Interdisciplinary Bridge π§ Keyword Pioneer π Cross-Pollinator (13)
π
Renaissance Researcher
(7)
π
Conference Polyglot
(7)
π
Academic Marathon
(10)
π€
Dynamic Duo
(10)
π₯
Mega-Team
(50)
π§¬
Topic Evolution
π
Century Club
(19)
β‘
Prolific Year
(5)
π
Conference Pioneer
ποΈ
Keyword Collector
(74)
π₯
Unstoppable
(9)
Conferences
CVPR (6)
ECCV (4)
ICCV (3)
ACL (2)
NIPS (2)
ICML (1)
NAACL (1)
Top co-authors
Keywords
zero-shot learning
(4)
vision-language model
(4)
multimodal learning
(3)
visual question answering
(3)
large language model
(3)
data augmentation
(2)
in-context learning
(2)
visual reasoning
(2)
multi-task learning
(1)
pose estimation
(1)
transformer architecture
(1)
object detection
(1)
transfer learning
(1)
imitation learning
(1)
reinforcement learning
(1)
code generation
(1)
image captioning
(1)
model selection
(1)
video understanding
(1)
action recognition
(1)
Papers
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation
ACL 2025
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
CVPR 2025
Selective βSelective Predictionβ: Reducing Unnecessary Abstention in Vision-Language Reasoning
ACL 2024
WebWISE: Unlocking Web Interface Control for LLMs via Sequential Exploration
NAACL 2024
Task Me Anything
NIPS 2024
SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World
CVPR 2024
m&mβs: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks
ECCV 2024
Visual Programming: Compositional Visual Reasoning Without Training
CVPR 2023
OBJECT 3DIT: Language-guided 3D-aware Image Editing
NIPS 2023
Webly Supervised Concept Expansion for General Purpose Vision Models
ECCV 2022
Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture
CVPR 2022
Visual Semantic Role Labeling for Video Understanding
CVPR 2021
Learning Curves for Analysis of Deep Networks
ICML 2021
Contrastive Learning for Weakly Supervised Phrase Grounding
ECCV 2020
No-Frills Human-Object Interaction Detection: Factorization, Layout Encodings, and Training Techniques
ICCV 2019
ViCo: Word Embeddings From Visual Co-Occurrences
ICCV 2019
Imagine This! Scripts to Compositions to Videos
ECCV 2018
Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks
ICCV 2017
Completing 3D Object Shape From One Depth Image
CVPR 2015