Carl Doersch

23 papers · 2012–2025 · 8 conferences · across top CS/AI conferences

Achievements

+9 more ↓

🏃 Academic Marathon (13) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🌍 Conference Polyglot (8) 🐝 Cross-Pollinator (4)

🌍 Conference Polyglot (8) 🏃 Academic Marathon (13) 🐣 Hot Topic Early Bird 👥 Mega-Team (34) 🤝 Dynamic Duo (12) 🧬 Topic Evolution 💎 Century Club (23) 🗃️ Keyword Collector (96) 🚀 Conference Pioneer

Conferences

NIPS (8) CVPR (5) ICCV (4) ICML (2) CORL (1) ECCV (1) ICLR (1) JMLR (1)

Top co-authors

Andrew Zisserman (12) Joao Carreira (10) Yi Yang (7) Ankush Gupta (4) Skanda Koppula (4) Mehdi S. M. Sajjadi (4) Yusuf Aytar (4) Viorica Patraucean (3) Ignacio Rocco (3) Dilara Gokay (3)

Keywords

video understanding (6) self-supervised learning (4) scene understanding (3) point tracking (3) motion estimation (3) transfer learning (2) representation learning (2) synthetic data generation (2) video tracking (2) few-shot learning (2) depth estimation (2) domain adaptation (2) 3d vision (2) optical flow (2) online learning (1) contrastive learning (1) temporal modeling (1) action recognition (1) 3d reconstruction (1) transformer architecture (1)

Papers

Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation CORL 2025 Direct Motion Models for Assessing Generated Videos ICML 2025 TAPNext: Tracking Any Point (TAP) as Next Token Prediction ICCV 2025 Learning from One Continuous Video Stream CVPR 2024 TAPVid-3D: A Benchmark for Tracking Any Point in 3D NIPS 2024 Moving Off-the-Grid: Scene-Grounded Video Representations NIPS 2024 TAPIR: Tracking Any Point with Per-Frame Initialization and Temporal Refinement ICCV 2023 Perception Test: A Diagnostic Benchmark for Multimodal Video Models NIPS 2023 Kubric: A Scalable Dataset Generator CVPR 2022 Perceiver IO: A General Architecture for Structured Inputs & Outputs ICLR 2022 Input-Level Inductive Biases for 3D Reconstruction CVPR 2022 TAP-Vid: A Benchmark for Tracking Any Point in a Video NIPS 2022 CrossTransformers: spatially-aware few-shot transfer NIPS 2020 Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning NIPS 2020 Structured agents for physical construction ICML 2019 Sim2real transfer learning for 3D human pose estimation: motion to the rescue NIPS 2019 Video Action Transformer Network CVPR 2019 Exploiting Temporal Context for 3D Human Pose Estimation in the Wild CVPR 2019 Learning Visual Question Answering by Bootstrapping Hard Attention ECCV 2018 Multi-Task Self-Supervised Visual Learning ICCV 2017 Unsupervised Visual Representation Learning by Context Prediction ICCV 2015 Mid-level Visual Element Discovery as Discriminative Mode Seeking NIPS 2013 Bounding the Probability of Error for High Precision Optical Character Recognition JMLR 2012