Ivan Laptev
64 papers · 2011–2025 · 9 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+16 more ↓ Show less ↑
🧭 Keyword Pioneer 🌍 Conference Polyglot (9) 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (6) 🏃 Academic Marathon (14)
🌉
Interdisciplinary Bridge
🏃
Academic Marathon
(14)
🧭
Keyword Pioneer
🌟
Keyword Trendsetter Combo
(4)
🏠
Conference Loyalist
(28)
🤝
Dynamic Duo
(30)
🏆
Keyword Champion
👥
Mega-Team
(69)
🔬
Deep Specialist
(12)
🚀
Conference Pioneer
💎
Century Club
(64)
📈
Trend Setter
⚡
Prolific Year
(7)
🗃️
Keyword Collector
(290)
🔥
Unstoppable
(13)
❓
The Questioner
Conferences
CVPR (28)
ICCV (14)
NIPS (9)
CORL (4)
ECCV (4)
ACL (2)
EMNLP (1)
ICML (1)
L4DC (1)
Top co-authors
Keywords
video understanding
(9)
multimodal learning
(8)
action recognition
(8)
weakly supervised learning
(6)
object detection
(5)
self-supervised learning
(5)
convolutional neural network
(5)
3d reconstruction
(5)
robotic manipulation
(4)
zero-shot learning
(4)
depth estimation
(3)
instructional video
(3)
large multimodal model
(3)
video segmentation
(3)
differentiable rendering
(3)
semantic segmentation
(3)
action localization
(3)
hand pose estimation
(3)
video question answering
(3)
multimodal transformer
(3)
Papers
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
CVPR 2025
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
ACL 2025
RoomTour3D: Geometry-Aware Video-Instruction Tuning for Embodied Navigation
CVPR 2025
A Culturally-diverse Multilingual Multimodal Video Benchmark & Model
EMNLP 2025
ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions
CVPR 2025
Learning Feasible Transitions for Efficient Contact Planning
L4DC 2025
ScanEdit: Hierarchically-Guided Functional 3D Scan Editing
ICCV 2025
SUGAR: Pre-training 3D Visual Representations for Robotics
CVPR 2024
PairDETR : Joint Detection and Association of Human Bodies and Faces
CVPR 2024
Mitigating Object Hallucination via Concentric Causal Attention
NIPS 2024
GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos
CVPR 2024
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
CVPR 2023
PolarNet: 3D Point Clouds for Language-Guided Robotic Manipulation
CORL 2023
Tackling Ambiguity with Images: Improved Multimodal Machine Translation and Contrastive Evaluation
ACL 2023
gSDF: Geometry-Driven Signed Distance Functions for 3D Hand-Object Reconstruction
CVPR 2023
VidChapters-7M: Video Chapters at Scale
NIPS 2023
Look for the Change: Learning Object States and State-Modifying Actions From Untrimmed Web Videos
CVPR 2022
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
NIPS 2022
Language Conditioned Spatial Relation Reasoning for 3D Object Grounding
NIPS 2022
Instruction-driven history-aware policies for robotic manipulations
CORL 2022
Think Global, Act Local: Dual-Scale Graph Transformer for Vision-and-Language Navigation
CVPR 2022
TubeDETR: Spatio-Temporal Video Grounding With Transformers
CVPR 2022
AlignSDF: Pose-Aligned Signed Distance Fields for Hand-Object Reconstruction
ECCV 2022
Learning from Unlabeled 3D Environments for Vision-and-Language Navigation
ECCV 2022
Differentiable rendering with perturbed optimizers
NIPS 2021
Thinking Fast and Slow: Efficient Text-to-Visual Retrieval With Transformers
CVPR 2021
Segmenter: Transformer for Semantic Segmentation
ICCV 2021
Airbert: In-Domain Pretraining for Vision-and-Language Navigation
ICCV 2021
History Aware Multimodal Transformer for Vision-and-Language Navigation
NIPS 2021
Just Ask: Learning To Answer Questions From Millions of Narrated Videos
ICCV 2021
XCiT: Cross-Covariance Image Transformers
NIPS 2021
Goal-Conditioned Reinforcement Learning with Imagined Subgoals
ICML 2021
Learning Obstacle Representations for Neural Motion Planning
CORL 2020
Learning Object Manipulation Skills via Approximate State Estimation from Real Videos
CORL 2020
Leveraging Photometric Consistency Over Time for Sparsely Supervised Hand-Object Reconstruction
CVPR 2020
Action Modifiers: Learning From Adverbs in Instructional Videos
CVPR 2020
End-to-End Learning of Visual Representations From Uncurated Instructional Videos
CVPR 2020
Learning Interactions and Relationships Between Movie Characters
CVPR 2020
Learning Actionness via Long-range Temporal Order Verification
ECCV 2020
Deep Metric Learning Beyond Binary Supervision
CVPR 2019
Estimating 3D Motion and Forces of Person-Object Interactions From Monocular Video
CVPR 2019
Learning Joint Reconstruction of Hands and Manipulated Objects
CVPR 2019
Detecting Unseen Visual Relations Using Analogies
ICCV 2019
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
ICCV 2019
Cross-Task Weakly Supervised Learning From Instructional Videos
CVPR 2019
BodyNet: Volumetric Inference of 3D Human Body Shapes
ECCV 2018
A flexible model for training action localization with varying levels of supervision
NIPS 2018
Learning From Synthetic Humans
CVPR 2017
Weakly-Supervised Learning of Visual Relations
ICCV 2017
Learning From Video and Text via Large-Scale Discriminative Clustering
ICCV 2017
Joint Discovery of Object States and Manipulation Actions
ICCV 2017
Instance-Level Video Segmentation From Object Tracks
CVPR 2016
Thin-Slicing for Pose: Learning to Understand Pose Without Explicit Pose Estimation
CVPR 2016
Unsupervised Learning From Narrated Instruction Videos
CVPR 2016
Is Object Localization for Free? - Weakly-Supervised Learning With Convolutional Neural Networks
CVPR 2015
On Pairwise Costs for Network Flow Multi-Object Tracking
CVPR 2015
Context-Aware CNNs for Person Head Detection
ICCV 2015
Unsupervised Object Discovery and Tracking in Video Collections
ICCV 2015
P-CNN: Pose-Based CNN Features for Action Recognition
ICCV 2015
Weakly-Supervised Alignment of Video With Text
ICCV 2015
Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks
CVPR 2014
Efficient Feature Extraction, Encoding and Classification for Action Recognition
CVPR 2014
Pose Estimation and Segmentation of People in 3D Movies
ICCV 2013
Learning person-object interactions for action recognition in still images
NIPS 2011