Christoph Feichtenhofer

46 papers · 2014–2025 · 8 conferences · across top CS/AI conferences

Achievements

+16 more ↓

🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (6) 🌍 Conference Polyglot (8) 🏃 Academic Marathon (11) 🗺️ Taxonomy Completionist (64)

🗺️ Taxonomy Completionist (64) 🧭 Keyword Pioneer 🐣 Hot Topic Early Bird 🏠 Conference Loyalist (22) 🔬 Deep Specialist (12) 🤝 Dynamic Duo (15) 👑 Triple Crown 🧬 Topic Evolution 👥 Mega-Team (85) 🏆 Keyword Champion 🗃️ Keyword Collector (172) 📈 Trend Setter ❓ The Questioner ⚡ Prolific Year (9) 💎 Century Club (46) 🔥 Unstoppable (12)

Conferences

CVPR (22) ICCV (9) NIPS (6) ICLR (4) EMNLP (2) ACL (1) ICML (1) IJCNLP (1)

Top co-authors

haoqi fan (15) Jitendra Malik (11) Yanghao Li (11) Po-Yao Huang (10) Hu Xu (9) Chao-Yuan Wu (9) Axel Pinz (8) Gargi Ghosh (7) Luke Zettlemoyer (6) Bo Xiong (6)

Keywords

video understanding (12) video recognition (8) action recognition (7) vision transformer (7) self-supervised learning (7) convolutional neural network (6) representation learning (5) masked autoencoder (5) image classification (5) contrastive learning (4) temporal modeling (3) video classification (3) spatiotemporal feature (3) video analysis (3) transfer learning (3) video action recognition (3) 3d pose estimation (2) zero-shot learning (2) multiview learning (2) model compression (2)

Papers

An Empirical Study of Autoregressive Pre-training from Videos ICCV 2025 SAM 2: Segment Anything in Images and Videos ICLR 2025 Demystifying CLIP Data ICLR 2024 Altogether: Image Captioning via Re-aligning Alt-text EMNLP 2024 Window Attention is Bugged: How not to Interpolate Position Embeddings ICLR 2024 Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles ICML 2023 Token Merging: Your ViT But Faster ICLR 2023 Diffusion Models as Masked Autoencoders ICCV 2023 The Effectiveness of MAE Pre-Pretraining for Billion-Scale Pretraining ICCV 2023 CiT: Curation in Training for Effective Vision-Language Data ICCV 2023 Multiview Compressive Coding for 3D Reconstruction CVPR 2023 Scaling Language-Image Pre-Training via Masking CVPR 2023 On the Benefits of 3D Pose and Tracking for Human Action Recognition CVPR 2023 MAViL: Masked Audio-Video Learners NIPS 2023 A ConvNet for the 2020s CVPR 2022 TrackFormer: Multi-Object Tracking With Transformers CVPR 2022 Ego4D: Around the World in 3,000 Hours of Egocentric Video CVPR 2022 Masked Autoencoders As Spatiotemporal Learners NIPS 2022 Masked Autoencoders that Listen NIPS 2022 MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition CVPR 2022 Reversible Vision Transformers CVPR 2022 Masked Feature Prediction for Self-Supervised Visual Pre-Training CVPR 2022 MViTv2: Improved Multiscale Vision Transformers for Classification and Detection CVPR 2022 VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding IJCNLP 2021 Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers NIPS 2021 VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding ACL 2021 A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning CVPR 2021 VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding EMNLP 2021 Multiscale Vision Transformers ICCV 2021 Multiview Pseudo-Labeling for Semi-Supervised Learning From Video ICCV 2021 X3D: Expanding Architectures for Efficient Video Recognition CVPR 2020 Ego-Topo: Environment Affordances From Egocentric Video CVPR 2020 A Multigrid Method for Efficiently Training Video Models CVPR 2020 Learning Temporal Pose Estimation from Sparsely-Labeled Videos NIPS 2019 3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training CVPR 2019 Long-Term Feature Banks for Detailed Video Understanding CVPR 2019 SlowFast Networks for Video Recognition ICCV 2019 Grounded Human-Object Interaction Hotspots From Video ICCV 2019 What Have We Learned From Deep Representations for Action Recognition? CVPR 2018 Detect to Track and Track to Detect ICCV 2017 Temporal Residual Networks for Dynamic Scene Recognition CVPR 2017 Spatiotemporal Multiplier Networks for Video Action Recognition CVPR 2017 Convolutional Two-Stream Network Fusion for Video Action Recognition CVPR 2016 Spatiotemporal Residual Networks for Video Action Recognition NIPS 2016 Dynamically Encoded Actions Based on Spacetime Saliency CVPR 2015 Bags of Spacetime Energies for Dynamic Scene Recognition CVPR 2014