Makarand Tapaswi
30 papers · 2013–2026 · 9 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+13 more ↓ Show less ↑
π Renaissance Researcher (10) π Interdisciplinary Bridge π Academic Marathon (13) π Conference Polyglot (9) πΊοΈ Taxonomy Completionist (74)
π
Conference Polyglot
(9)
π£
Hot Topic Early Bird
π
Cross-Pollinator
(15)
π
Keyword Trendsetter Combo
(4)
π
Keyword Champion
(2)
π§¬
Topic Evolution
π
Trend Setter
β‘
Prolific Year
(5)
π
Conference Pioneer
π
Century Club
(30)
π₯
Unstoppable
(14)
β
The Questioner
ποΈ
Keyword Collector
(158)
Conferences
CVPR (14)
ICCV (4)
WACV (3)
CORL (2)
EMNLP (2)
NIPS (2)
ECCV (1)
ICLR (1)
NAACL (1)
Top co-authors
Keywords
video understanding
(11)
multimodal learning
(8)
scene understanding
(3)
zero-shot learning
(3)
video retrieval
(2)
embodied agent
(2)
video-language model
(2)
question answering
(2)
video captioning
(2)
named entity recognition
(2)
3d vision
(2)
embedding learning
(2)
coreference resolution
(2)
scene graph
(2)
unsupervised learning
(2)
visual question answering
(2)
weakly supervised learning
(2)
few-shot learning
(2)
audio-visual learning
(2)
story understanding
(2)
Papers
STRinGS: Selective Text Refinement in Gaussian Splatting
WACV 2026
VELOCITI: Benchmarking Video-Language Compositional Reasoning with Strict Entailment
CVPR 2025
Seeing Eye to AI: Comparing Human Gaze and Model Attention in Video Memorability
WACV 2025
IdentifyMe: A Challenging Long-Context Mention Resolution Benchmark for LLMs
NAACL 2025
What You See is What You Ask: Evaluating Audio Descriptions
EMNLP 2025
Previously on ... From Recaps to Story Summarization
CVPR 2024
MICap: A Unified Model for Identity-Aware Movie Descriptions
CVPR 2024
Major Entity Identification: A Generalizable Alternative to Coreference Resolution
EMNLP 2024
How You Feelin'? Learning Emotions and Mental States in Movie Scenes
CVPR 2023
Test of Time: Instilling Video-Language Models With a Sense of Time
CVPR 2023
Unsupervised Audio-Visual Lecture Segmentation
WACV 2023
Think Global, Act Local: Dual-Scale Graph Transformer for Vision-and-Language Navigation
CVPR 2022
Grounded Video Situation Recognition
NIPS 2022
Language Conditioned Spatial Relation Reasoning for 3D Object Grounding
NIPS 2022
Learning from Unlabeled 3D Environments for Vision-and-Language Navigation
ECCV 2022
Instruction-driven history-aware policies for robotic manipulations
CORL 2022
Airbert: In-Domain Pretraining for Vision-and-Language Navigation
ICCV 2021
Learning Object Manipulation Skills via Approximate State Estimation from Real Videos
CORL 2020
Learning Interactions and Relationships Between Movie Characters
CVPR 2020
Visual Reasoning by Progressive Module Networks
ICLR 2019
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
ICCV 2019
Video Face Clustering With Unknown Number of Clusters
ICCV 2019
MovieGraphs: Towards Understanding Human-Centric Situations From Videos
CVPR 2018
Now You Shake Me: Towards Automatic 4D Cinema
CVPR 2018
Situation Recognition With Graph Neural Networks
ICCV 2017
MovieQA: Understanding Stories in Movies Through Question-Answering
CVPR 2016
Recovering the Missing Link: Predicting Class-Attribute Associations for Unsupervised Zero-Shot Learning
CVPR 2016
Book2Movie: Aligning Video Scenes With Book Chapters
CVPR 2015
StoryGraphs: Visualizing Character Interactions as a Timeline
CVPR 2014
Semi-supervised Learning with Constraints for Person Identification in Multimedia Data
CVPR 2013