Kristen Grauman

125 papers · 2006–2026 · 9 conferences · across top CS/AI conferences

Achievements

+18 more ↓

🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (16) 🌍 Conference Polyglot (9)

🌉 Interdisciplinary Bridge 🏃 Academic Marathon (20) 🌈 Renaissance Researcher (12) 🏠 Conference Loyalist (23) 🌟 Keyword Trendsetter Combo (17) 🤝 Dynamic Duo (18) 🌱 Topic Pioneer 🧬 Topic Evolution 🏆 Keyword Champion 👥 Mega-Team (100) 🔬 Deep Specialist (26) 📈 Trend Setter 🔥 Unstoppable (17) ❓ The Questioner (2) 🚀 Conference Pioneer ⚡ Prolific Year (13) 💎 Century Club (125) 🗃️ Keyword Collector (60)

Conferences

CVPR (58) NIPS (23) ICCV (20) ECCV (15) ICML (3) ICLR (2) WACV (2) CORL (1) JMLR (1)

Top co-authors

Ziad Al-Halah (18) Tushar Nagarajan (17) Changan Chen (13) Santhosh Kumar Ramakrishnan (11) Ruohan Gao (11) Zihui Xue (10) Fei Sha (10) Sagnik Majumder (9) Kumar Ashutosh (8) Bo Xiong (6)

Research topics

Domain-Specific (1) Social (1) Core AI (1)

Keywords

video understanding (22) egocentric video (14) egocentric vision (10) audio-visual learning (8) multimodal learning (8) convolutional neural network (7) active learning (7) representation learning (7) zero-shot learning (6) reinforcement learning (6) relative attribute (6) action recognition (6) transfer learning (6) self-supervised learning (6) metric learning (6) object recognition (6) activity recognition (5) image retrieval (4) contrastive learning (4) weakly supervised learning (4)

Papers

SPOC: Spatially-Progressing Object State Change Segmentation in Video WACV 2026 Viewpoint Rosetta Stone: Unlocking Unpaired Ego-Exo Videos for View-invariant Representation Learning CVPR 2025 Progress-Aware Video Frame Captioning CVPR 2025 Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Instructional Videos CVPR 2025 FIction: 4D Future Interaction Prediction from Video CVPR 2025 ExpertAF: Expert Actionable Feedback from Video CVPR 2025 Switch-a-View: View Selection Learned from Unlabeled In-the-wild Videos ICCV 2025 Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives CVPR 2024 Detours for Navigating Instructional Videos CVPR 2024 Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos CVPR 2024 Learning Object State Changes in Videos: An Open-World Perspective CVPR 2024 Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos ECCV 2024 Put Myself in Your Shoes: Lifting the Egocentric Perspective from Exocentric Videos ECCV 2024 HOI-Swap: Swapping Objects in Videos with Hand-Object Interaction Awareness NIPS 2024 4Diff: 3D-Aware Diffusion Model for Third-to-First Viewpoint Translation ECCV 2024 SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos CVPR 2024 Self-Supervised Visual Acoustic Matching NIPS 2023 EgoDistill: Egocentric Head Motion Distillation for Efficient Video Understanding NIPS 2023 Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal Alignment NIPS 2023 EgoEnv: Human-centric environment representations from egocentric video NIPS 2023 Video-Mined Task Graphs for Keystep Recognition in Instructional Videos NIPS 2023 EgoTracks: A Long-term Egocentric Visual Object Tracking Dataset NIPS 2023 HierVL: Learning Hierarchical Video-Language Embeddings CVPR 2023 Egocentric Video Task Translation CVPR 2023 SpotEM: Efficient Video Search for Episodic Memory ICML 2023 Chat2Map: Efficient Scene Mapping From Multi-Ego Conversations CVPR 2023 Novel-View Acoustic Synthesis CVPR 2023 NaQ: Leveraging Narrations As Queries To Supervise Episodic Memory CVPR 2023 Single-Stage Visual Query Localization in Egocentric Videos NIPS 2023 Ego4D: Around the World in 3,000 Hours of Egocentric Video CVPR 2022 Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation CVPR 2022 PONI: Potential Functions for ObjectGoal Navigation With Interaction-Free Learning CVPR 2022 Environment Predictive Coding for Visual Navigation ICLR 2022 Active Audio-Visual Separation of Dynamic Sound Sources ECCV 2022 Egocentric Activity Recognition and Localization on a 3D Map ECCV 2022 Few-Shot Audio-Visual Learning of Environment Acoustics NIPS 2022 SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning NIPS 2022 Discovering Underground Maps From Fashion WACV 2022 Visual Acoustic Matching CVPR 2022 VisualVoice: Audio-Visual Speech Separation With Cross-Modal Consistency CVPR 2021 Shaping embodied agent behavior with activity-context priors from egocentric video NIPS 2021 DexVIP: Learning Dexterous Grasping with Human Hand Pose Priors from Video CORL 2021 Ego-Exo: Transferring Visual Representations From Third-Person to First-Person Videos CVPR 2021 Semantic Audio-Visual Navigation CVPR 2021 Fashion IQ: A New Dataset Towards Retrieving Images by Natural Language Feedback CVPR 2021 From Culture to Clothing: Discovering the World Events Behind a Century of Fashion Images ICCV 2021 Move2Hear: Active Audio-Visual Source Separation ICCV 2021 Multiview Pseudo-Labeling for Semi-Supervised Learning From Video ICCV 2021 Audio-Visual Floorplan Reconstruction ICCV 2021 Anticipative Video Transformer ICCV 2021 Learning to Set Waypoints for Audio-Visual Navigation ICLR 2021 Listen to Look: Action Recognition by Previewing Audio CVPR 2020 Proposal-based Video Completion ECCV 2020 VisualEchoes: Spatial Image Representation Learning through Echolocation ECCV 2020 Occupancy Anticipation for Efficient Exploration and Navigation ECCV 2020 Learning Affordance Landscapes for Interaction Exploration in 3D Environments NIPS 2020 From Paris to Berlin: Discovering Fashion Style Influences Around the World CVPR 2020 ViBE: Dressing for Diverse Body Shapes CVPR 2020 SoundSpaces: Audio-Visual Navigation in 3D Environments ECCV 2020 Ego-Topo: Environment Affordances From Egocentric Video CVPR 2020 You2Me: Inferring Body Pose in Egocentric Video via First and Second Person Interactions CVPR 2020 Don't Judge an Object by Its Context: Learning to Overcome Contextual Bias CVPR 2020 Kernel Transformer Networks for Compact Spherical Convolution CVPR 2019 Grounded Human-Object Interaction Hotspots From Video ICCV 2019 Extreme Relative Pose Estimation for RGB-D Scans via Scene Completion CVPR 2019 Less Is More: Learning Highlight Detection From Video Duration CVPR 2019 Thinking Outside the Pool: Active Training Image Creation for Relative Attributes CVPR 2019 2.5D Visual Sound CVPR 2019 Co-Separating Sounds of Visual Objects ICCV 2019 Fashion++: Minimal Edits for Outfit Improvement ICCV 2019 SpotTune: Transfer Learning Through Adaptive Fine-Tuning CVPR 2019 Retrospective Encoders for Video Summarization ECCV 2018 Compare and Contrast: Learning Prominent Visual Differences CVPR 2018 Learning to Look Around: Intelligently Exploring Unseen Environments for Unknown Tasks CVPR 2018 BlockDrop: Dynamic Inference Paths in Residual Networks CVPR 2018 Learning Compressible 360° Video Isomers CVPR 2018 Creating Capsule Wardrobes From Fashion Images CVPR 2018 Im2Flow: Motion Hallucination From Static Images for Action Recognition CVPR 2018 VizWiz Grand Challenge: Answering Visual Questions From Blind People CVPR 2018 Learning to Separate Object Sounds by Watching Unlabeled Video ECCV 2018 Attributes as Operators: Factorizing Unseen Attribute-Object Compositions ECCV 2018 Sidekick Policy Learning for Active Visual Exploration ECCV 2018 Snap Angle Prediction for 360° Panoramas ECCV 2018 ShapeCodes: Self-Supervised Feature Learning by Lifting Views to Viewgrids ECCV 2018 Fashion Forward: Forecasting Visual Style in Fashion ICCV 2017 On-Demand Learning for Deep Image Restoration ICCV 2017 Learning the Latent "Look": Unsupervised Discovery of a Style-Coherent Embedding From Fashion Images ICCV 2017 Learning Spherical Convolution for Fast Features from 360° Imagery NIPS 2017 FusionSeg: Learning to Combine Motion and Appearance for Fully Automatic Segmentation of Generic Objects in Videos CVPR 2017 Seeing Invisible Poses: Estimating 3D Body Pose From Egocentric Video CVPR 2017 Detangling People: Individuating Multiple Close People and Their Body Parts via Region Assembly CVPR 2017 Semantic Jitter: Dense Supervision for Visual Comparisons via Synthetic Images ICCV 2017 Making 360deg Video Watchable in 2D: Learning Videography for Click Free Viewing CVPR 2017 Slow and Steady Feature Analysis: Higher Order Temporal Coherence in Video CVPR 2016 Active Image Segmentation Propagation CVPR 2016 Summary Transfer: Exemplar-Based Subset Selection for Video Summarization CVPR 2016 Pull the Plug? Predicting If Computers or Humans Should Segment Images CVPR 2016 Just Noticeable Differences in Visual Attributes ICCV 2015 Learning Image Representations Tied to Ego-Motion ICCV 2015 Diverse Sequential Subset Selection for Supervised Video Summarization NIPS 2014 Predicting Useful Neighborhoods for Lazy Local Learning NIPS 2014 Zero-shot recognition with unreliable attributes NIPS 2014 Inferring Unseen Views of People CVPR 2014 Decorrelating Semantic Visual Attributes by Resisting the Urge to Share CVPR 2014 Beyond Comparing Image Pairs: Setwise Active Learning for Relative Attributes CVPR 2014 Fine-Grained Visual Comparisons with Local Learning CVPR 2014 Inferring Analogous Attributes CVPR 2014 Analogy-preserving Semantic Embedding for Visual Object Categorization ICML 2013 Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation ICCV 2013 Implied Feedback: Learning Nuances of User Behavior in Image Search ICCV 2013 Attribute Pivots for Guiding Relevance Feedback in Image Search ICCV 2013 Active Learning of an Action Detector from Untrimmed Videos ICCV 2013 Attribute Adaptation for Personalized Image Search ICCV 2013 Reshaping Visual Datasets for Domain Adaptation NIPS 2013 Deformable Spatial Pyramid Matching for Fast Dense Correspondences CVPR 2013 Story-Driven Summarization for Egocentric Video CVPR 2013 Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots CVPR 2013 Connecting the Dots with Landmarks: Discriminatively Learning Domain-Invariant Features for Unsupervised Domain Adaptation ICML 2013 Semantic Kernel Forests from Multiple Taxonomies NIPS 2012 Learning a Tree of Metrics with Disjoint Visual Features NIPS 2011 Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning NIPS 2010 Online Metric Learning and Fast Similarity Search NIPS 2008 Multi-Level Active Prediction of Useful Image Annotations for Recognition NIPS 2008 The Pyramid Match Kernel: Efficient Learning with Sets of Features JMLR 2007 Approximate Correspondences in High Dimensions NIPS 2006