Makarand Tapaswi

30 papers · 2013–2026 · 9 conferences · across top CS/AI conferences

Achievements

+13 more ↓

🌈 Renaissance Researcher (10) 🌉 Interdisciplinary Bridge 🏃 Academic Marathon (13) 🌍 Conference Polyglot (9) 🗺️ Taxonomy Completionist (74)

🌍 Conference Polyglot (9) 🐣 Hot Topic Early Bird 🐝 Cross-Pollinator (15) 🌟 Keyword Trendsetter Combo (4) 🏆 Keyword Champion (2) 🧬 Topic Evolution 📈 Trend Setter ⚡ Prolific Year (5) 🚀 Conference Pioneer 💎 Century Club (30) 🔥 Unstoppable (14) ❓ The Questioner 🗃️ Keyword Collector (158)

Conferences

CVPR (14) ICCV (4) WACV (3) CORL (2) EMNLP (2) NIPS (2) ECCV (1) ICLR (1) NAACL (1)

Top co-authors

Ivan Laptev (8) Sanja Fidler (6) Rainer Stiefelhagen (5) Cordelia Schmid (5) Shizhe Chen (5) Pierre-Louis Guhur (5) Vineet Gandhi (3) Martin Bauml (3) Zeeshan Khan (3) Kawshik Manikantan (2)

Keywords

video understanding (11) multimodal learning (8) scene understanding (3) zero-shot learning (3) video retrieval (2) embodied agent (2) video-language model (2) question answering (2) video captioning (2) named entity recognition (2) 3d vision (2) embedding learning (2) coreference resolution (2) scene graph (2) unsupervised learning (2) visual question answering (2) weakly supervised learning (2) few-shot learning (2) audio-visual learning (2) story understanding (2)

Papers

STRinGS: Selective Text Refinement in Gaussian Splatting WACV 2026 VELOCITI: Benchmarking Video-Language Compositional Reasoning with Strict Entailment CVPR 2025 Seeing Eye to AI: Comparing Human Gaze and Model Attention in Video Memorability WACV 2025 IdentifyMe: A Challenging Long-Context Mention Resolution Benchmark for LLMs NAACL 2025 What You See is What You Ask: Evaluating Audio Descriptions EMNLP 2025 Previously on ... From Recaps to Story Summarization CVPR 2024 MICap: A Unified Model for Identity-Aware Movie Descriptions CVPR 2024 Major Entity Identification: A Generalizable Alternative to Coreference Resolution EMNLP 2024 How You Feelin'? Learning Emotions and Mental States in Movie Scenes CVPR 2023 Test of Time: Instilling Video-Language Models With a Sense of Time CVPR 2023 Unsupervised Audio-Visual Lecture Segmentation WACV 2023 Think Global, Act Local: Dual-Scale Graph Transformer for Vision-and-Language Navigation CVPR 2022 Grounded Video Situation Recognition NIPS 2022 Language Conditioned Spatial Relation Reasoning for 3D Object Grounding NIPS 2022 Learning from Unlabeled 3D Environments for Vision-and-Language Navigation ECCV 2022 Instruction-driven history-aware policies for robotic manipulations CORL 2022 Airbert: In-Domain Pretraining for Vision-and-Language Navigation ICCV 2021 Learning Object Manipulation Skills via Approximate State Estimation from Real Videos CORL 2020 Learning Interactions and Relationships Between Movie Characters CVPR 2020 Visual Reasoning by Progressive Module Networks ICLR 2019 HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips ICCV 2019 Video Face Clustering With Unknown Number of Clusters ICCV 2019 MovieGraphs: Towards Understanding Human-Centric Situations From Videos CVPR 2018 Now You Shake Me: Towards Automatic 4D Cinema CVPR 2018 Situation Recognition With Graph Neural Networks ICCV 2017 MovieQA: Understanding Stories in Movies Through Question-Answering CVPR 2016 Recovering the Missing Link: Predicting Class-Attribute Associations for Unsupervised Zero-Shot Learning CVPR 2016 Book2Movie: Aligning Video Scenes With Book Chapters CVPR 2015 StoryGraphs: Visualizing Character Interactions as a Timeline CVPR 2014 Semi-supervised Learning with Constraints for Person Identification in Multimedia Data CVPR 2013