Josef Sivic
62 papers · 2009–2025 · 9 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+16 more ↓ Show less ↑
π Cross-Pollinator (12) π Conference Polyglot (9) π Academic Marathon (16) π§ Keyword Pioneer π Renaissance Researcher (12)
π
Interdisciplinary Bridge
π§
Keyword Pioneer
π
Renaissance Researcher
(12)
π
Keyword Trendsetter Combo
(6)
π
Conference Loyalist
(30)
π€
Dynamic Duo
(25)
π₯
Mega-Team
(31)
π¬
Deep Specialist
(10)
π
Keyword Champion
β‘
Prolific Year
(5)
π
Trend Setter
π
Conference Pioneer
β
The Questioner
(3)
ποΈ
Keyword Collector
(230)
π₯
Unstoppable
(13)
π
Century Club
(62)
Conferences
CVPR (30)
ICCV (13)
NIPS (7)
ECCV (4)
ICLR (3)
CORL (2)
AAAI (1)
EMNLP (1)
RSS (1)
Top co-authors
Keywords
video understanding
(9)
convolutional neural network
(8)
multimodal learning
(6)
visual localization
(6)
pose estimation
(5)
object detection
(5)
weakly supervised learning
(5)
3d reconstruction
(4)
image matching
(4)
image retrieval
(4)
render and compare
(3)
visual place recognition
(3)
zero-shot learning
(3)
place recognition
(3)
cross-modal retrieval
(3)
self-supervised learning
(3)
video retrieval
(3)
vision-language model
(3)
video segmentation
(2)
semantic segmentation
(2)
Papers
ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions
CVPR 2025
Improving Personalized Search with Regularized Low-Rank Parameter Updates
CVPR 2025
Learning to engineer protein flexibility
ICLR 2025
6D Object Pose Tracking in Internet Videos for Robotic Manipulation
ICLR 2025
ResidualViT for Efficient Temporally Dense Video Encoding
ICCV 2025
Discovering Divergent Representations between Text-to-Image Models
ICCV 2025
Large-scale Pre-training for Grounded Video Caption Generation
ICCV 2025
Learning to design protein-protein interactions with enhanced generalization
ICLR 2024
GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos
CVPR 2024
MassSpecGym: A benchmark for the discovery and identification of molecules
NIPS 2024
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
CVPR 2023
Language-Guided Music Recommendation for Video via Prompt Analogies
CVPR 2023
Meta-Personalizing Vision-Language Models To Find Named Instances in Video
CVPR 2023
POP-3D: Open-Vocabulary 3D Occupancy Prediction from Images
NIPS 2023
VidChapters-7M: Video Chapters at Scale
NIPS 2023
TubeDETR: Spatio-Temporal Video Grounding With Transformers
CVPR 2022
Look for the Change: Learning Object States and State-Modifying Actions From Untrimmed Web Videos
CVPR 2022
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
NIPS 2022
MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare
CORL 2022
Collision Detection Accelerated: An Optimization Perspective
RSS 2022
Drive&Segment: Unsupervised Semantic Segmentation of Urban Scenes via Cross-Modal Distillation
ECCV 2022
Focal Length and Object Pose Estimation via Render and Compare
CVPR 2022
Weakly Supervised Human-Object Interaction Detection in Video via Contrastive Spatiotemporal Regions
ICCV 2021
Artificial Dummies for Urban Dataset Augmentation
AAAI 2021
Single-View Robot Pose and Joint Angle Estimation via Render & Compare
CVPR 2021
Thinking Fast and Slow: Efficient Text-to-Visual Retrieval With Transformers
CVPR 2021
Just Ask: Learning To Answer Questions From Millions of Narrated Videos
ICCV 2021
Efficient Neighbourhood Consensus Networks via Submanifold Sparse Convolutions
ECCV 2020
End-to-End Learning of Visual Representations From Uncurated Instructional Videos
CVPR 2020
Learning Object Manipulation Skills via Approximate State Estimation from Real Videos
CORL 2020
Learning Actionness via Long-range Temporal Order Verification
ECCV 2020
CosyPose: Consistent multi-view multi-object 6D pose estimation
ECCV 2020
Estimating 3D Motion and Forces of Person-Object Interactions From Monocular Video
CVPR 2019
Detecting Unseen Visual Relations Using Analogies
ICCV 2019
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
ICCV 2019
Is This the Right Place? Geometric-Semantic Pose Verification for Indoor Visual Localization
ICCV 2019
Cross-Task Weakly Supervised Learning From Instructional Videos
CVPR 2019
D2-Net: A Trainable CNN for Joint Description and Detection of Local Features
CVPR 2019
Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions
CVPR 2018
InLoc: Indoor Visual Localization With Dense Matching and View Synthesis
CVPR 2018
Neighbourhood Consensus Networks
NIPS 2018
End-to-End Weakly-Supervised Semantic Alignment
CVPR 2018
Localizing Moments in Video with Temporal Language
EMNLP 2018
Learning From Video and Text via Large-Scale Discriminative Clustering
ICCV 2017
Localizing Moments in Video With Natural Language
ICCV 2017
ActionVLAD: Learning Spatio-Temporal Aggregation for Action Classification
CVPR 2017
Are Large-Scale 3D Models Really Necessary for Accurate Visual Localization?
CVPR 2017
Convolutional Neural Network Architecture for Geometric Matching
CVPR 2017
Joint Discovery of Object States and Manipulation Actions
ICCV 2017
Weakly-Supervised Learning of Visual Relations
ICCV 2017
NetVLAD: CNN Architecture for Weakly Supervised Place Recognition
CVPR 2016
Unsupervised Learning From Narrated Instruction Videos
CVPR 2016
On Pairwise Costs for Network Flow Multi-Object Tracking
CVPR 2015
24/7 Place Recognition by View Synthesis
CVPR 2015
Is Object Localization for Free? - Weakly-Supervised Learning With Convolutional Neural Networks
CVPR 2015
Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks
CVPR 2014
Seeing 3D Chairs: Exemplar Part-based 2D-3D Alignment using a Large Dataset of CAD Models
CVPR 2014
Learning and Calibrating Per-Location Classifiers for Visual Place Recognition
CVPR 2013
Pose Estimation and Segmentation of People in 3D Movies
ICCV 2013
Visual Place Recognition with Repetitive Structures
CVPR 2013
Learning person-object interactions for action recognition in still images
NIPS 2011
Segmenting Scenes by Matching Image Composites
NIPS 2009