Carl Vondrick

79 papers · 2011–2025 · 13 conferences · across top CS/AI conferences

Achievements

+18 more ↓

🐣 Hot Topic Early Bird 🌍 Conference Polyglot (13) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🏃 Academic Marathon (14)

🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🌟 Keyword Trendsetter Combo (6) 🏠 Conference Loyalist (24) 🤝 Dynamic Duo (15) 👥 Mega-Team (56) 👑 Triple Crown 🔬 Deep Specialist (11) 🏆 Keyword Champion 🏆 Grand Slam ❓ The Questioner 🗃️ Keyword Collector (277) 💎 Century Club (79) 🚀 Conference Pioneer 🔥 Unstoppable (11) 📈 Trend Setter ⚡ Prolific Year (12)

Conferences

CVPR (24) ICCV (12) ECCV (11) NIPS (11) ICLR (8) CORL (3) ICML (3) NAACL (2) AAAI (1) ACL (1) EMNLP (1) MLHC (1) UAI (1)

Top co-authors

Chengzhi Mao (15) Ruoshi Liu (14) Antonio Torralba (11) Junfeng Yang (10) Didac Suris (9) Mia Chiquier (7) Sachit Menon (7) Hao Wang (6) Basile Van Hoorick (6) Pavel Tokmakov (5)

Keywords

representation learning (10) self-supervised learning (9) video understanding (8) multimodal learning (6) 3d reconstruction (5) zero-shot learning (4) action recognition (4) visual reasoning (3) transfer learning (3) convolutional neural network (3) differentiable rendering (3) out-of-distribution generalization (3) pose estimation (3) adversarial attack (3) occlusion reasoning (3) video prediction (3) code generation (2) adversarial robustness (2) cross-modal learning (2) object detection (2)

Papers

Generative Data Mining with Longtail-Guided Diffusion ICML 2025 MINERVA: Evaluating Complex Video Reasoning ICCV 2025 DiSciPLE: Learning Interpretable Programs for Scientific Visual Discovery CVPR 2025 SelfIE: Self-Interpretation of Large Language Model Embeddings ICML 2024 Raidar: geneRative AI Detection viA Rewriting ICLR 2024 INViTE: INterpret and Control Vision-Language Models with Text Explanations ICLR 2024 Sin3DM: Learning a Diffusion Model from a Single 3D Textured Shape ICLR 2024 Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment ICLR 2024 MedAutoCorrect: Image-Conditioned Autocorrection in Medical Reporting MLHC 2024 Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities EMNLP 2024 Differentiable Robot Rendering CORL 2024 Dreamitate: Real-World Visuomotor Policy Learning via Video Generation CORL 2024 EraseDraw : Learning to Insert Objects by Erasing Them from Images ECCV 2024 Discovering Unwritten Visual Classifiers with Large Language Models ECCV 2024 Controlling the World by Sleight of Hand ECCV 2024 Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis ECCV 2024 How Video Meetings Change Your Expression ECCV 2024 GES : Generalized Exponential Splatting for Efficient Radiance Field Rendering CVPR 2024 pix2gestalt: Amodal Segmentation by Synthesizing Wholes CVPR 2024 Muscles in Action ICCV 2023 ClimSim: A large multi-scale dataset for hybrid physics-ML climate emulation NIPS 2023 Objaverse-XL: A Universe of 10M+ 3D Objects NIPS 2023 Doubly Right Object Recognition: A Why Prompt for Visual Rationales CVPR 2023 FLEX: Full-Body Grasping Without Full-Body Grasps CVPR 2023 Tracking Through Containers and Occluders in the Wild CVPR 2023 Humans As Light Bulbs: 3D Human Reconstruction From Thermal Reflection CVPR 2023 What You Can Reconstruct From a Shadow CVPR 2023 SHIFT3D: Synthesizing Hard Inputs For Tricking 3D Detectors ICCV 2023 SurfsUP: Learning Fluid Simulation for Novel Surfaces ICCV 2023 Zero-1-to-3: Zero-shot One Image to 3D Object ICCV 2023 ViperGPT: Visual Inference via Python Execution for Reasoning ICCV 2023 Landscape Learning for Neural Network Inversion ICCV 2023 Understanding Zero-shot Adversarial Robustness for Large-Scale Models ICLR 2023 Visual Classification via Description from Large Language Models ICLR 2023 Robust Perception through Equivariance ICML 2023 Forget-me-not! Contrastive critics for mitigating posterior collapse UAI 2022 Globetrotter: Connecting Languages by Connecting Images CVPR 2022 There’s a Time and Place for Reasoning Beyond the Image ACL 2022 Causal Transportability for Visual Recognition CVPR 2022 Revealing Occlusions With 4D Neural Fields CVPR 2022 UnweaveNet: Unweaving Activity Stories CVPR 2022 Real-Time Neural Voice Camouflage ICLR 2022 Discrete Representations Strengthen Vision Transformer Robustness ICLR 2022 Private Multiparty Perception for Navigation NIPS 2022 RESIN-11: Schema-guided Event Prediction for 11 Newsworthy Scenarios NAACL 2022 Representing Spatial Trajectories as Distributions NIPS 2022 It's Time for Artistic Correspondence in Music and Video CVPR 2022 RESIN: A Dockerized Schema-Guided Cross-document Cross-lingual Cross-media Information Extraction and Event Tracking System NAACL 2021 Dissecting Image Crops ICCV 2021 Adversarial Attacks Are Reversible With Natural Supervision ICCV 2021 Learning the Predictability of the Future CVPR 2021 Generative Interventions for Causal Learning CVPR 2021 Towards a Unifying Framework for Formal Theories of Novelty AAAI 2021 The Boombox: Visual Reconstruction from Acoustic Vibrations CORL 2021 Learning Goals From Failure CVPR 2021 Listening to Sounds of Silence for Speech Denoising NIPS 2020 Multitask Learning Strengthens Adversarial Robustness ECCV 2020 We Have So Much In Common: Modeling Semantic Relational Set Abstractions in Videos ECCV 2020 Learning to Learn Words from Visual Scenes ECCV 2020 Oops! Predicting Unintentional Action in Video CVPR 2020 VideoBERT: A Joint Model for Video and Language Representation Learning ICCV 2019 Relational Action Forecasting CVPR 2019 Multi-Level Multimodal Common Semantic Space for Image-Phrase Grounding CVPR 2019 Metric Learning for Adversarial Robustness NIPS 2019 AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions CVPR 2018 The Sound of Pixels ECCV 2018 Actor-centric Relation Network ECCV 2018 Tracking Emerges by Colorizing Videos ECCV 2018 Generating the Future With Adversarial Transformers CVPR 2017 Following Gaze in Video ICCV 2017 Generating Videos with Scene Dynamics NIPS 2016 SoundNet: Learning Sound Representations from Unlabeled Video NIPS 2016 Anticipating Visual Representations From Unlabeled Video CVPR 2016 Learning Aligned Cross-Modal Representations From Weakly Aligned Data CVPR 2016 Predicting Motivations of Actions by Leveraging Text CVPR 2016 Learning visual biases from human imagination NIPS 2015 Where are they looking? NIPS 2015 HOGgles: Visualizing Object Detection Features ICCV 2013 Video Annotation and Tracking with Active Learning NIPS 2011