Anurag Arnab

41 papers · 2017–2025 · 8 conferences · across top CS/AI conferences

Achievements

+14 more ↓

🏃 Academic Marathon (8) 🌍 Conference Polyglot (8) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (11)

🧭 Keyword Pioneer 🐝 Cross-Pollinator (11) 🌍 Conference Polyglot (8) 🤝 Dynamic Duo (18) 👥 Mega-Team (43) 🔬 Deep Specialist (12) 🧬 Topic Evolution 🏆 Keyword Champion (2) ❓ The Questioner (2) 🗃️ Keyword Collector (188) 🔥 Unstoppable (9) ⚡ Prolific Year (11) 💎 Century Club (41) 🚀 Conference Pioneer

Conferences

CVPR (18) ICCV (6) NIPS (6) ECCV (4) EMNLP (2) ICLR (2) ICML (2) ACML (1)

Top co-authors

Cordelia Schmid (18) Arsha Nagrani (10) Chen Sun (10) Mostafa Dehghani (9) Mario Lucic (5) Xingyi Zhou (5) Yi Tay (4) Xuehan Xiong (4) Shyamal Buch (4) Matthias Minderer (4)

Research topics

Core AI (1)

Keywords

video understanding (13) multimodal learning (6) vision-language model (6) transformer architecture (5) representation learning (4) action recognition (4) semantic segmentation (4) object detection (4) vision transformer (3) computer vision (3) video captioning (3) neural network architecture (2) message passing (2) model scaling (2) efficient computing (2) contrastive learning (2) attention mechanism (2) self-supervised learning (2) video transformer (2) parameter efficiency (2)

Papers

OVFact: Measuring and Improving Open-Vocabulary Factuality for Long Caption Models EMNLP 2025 Dense Video Object Captioning from Disjoint Supervision ICLR 2025 From Image to Video: An Empirical Study of Diffusion Representations ICCV 2025 Flexible Frame Selection for Efficient Video Reasoning CVPR 2025 Principles of Visual Tokens for Efficient Video Understanding ICCV 2025 VIEWS: Entity-Aware News Video Captioning EMNLP 2024 Optimizing Factorized Encoder Models: Time and Memory Reduction for Scalable and Efficient Action Recognition ECCV 2024 End-to-End Spatio-Temporal Action Localisation with Video Transformers CVPR 2024 VicTR: Video-conditioned Text Representations for Activity Recognition CVPR 2024 CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation CVPR 2024 Pixel-Aligned Language Model CVPR 2024 Time- Memory- and Parameter-Efficient Visual Adaptation CVPR 2024 Streaming Dense Video Captioning CVPR 2024 On Scaling Up a Multilingual Vision and Language Model CVPR 2024 Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels NIPS 2024 Mixture of Nested Experts: Adaptive Processing of Visual Tokens NIPS 2024 Token Turing Machines CVPR 2023 Does Visual Pretraining Help End-to-End Reasoning? NIPS 2023 How Can Objects Help Action Recognition? CVPR 2023 UnLoc: A Unified Framework for Video Localization Tasks ICCV 2023 Audiovisual Masked Autoencoders ICCV 2023 Scaling Vision Transformers to 22 Billion Parameters ICML 2023 Adaptive Computation with Elastic Input Sequence ICML 2023 End-to-End Generative Pretraining for Multimodal Video Captioning CVPR 2022 Simple Open-Vocabulary Object Detection with Vision Transformers ECCV 2022 The Efficiency Misnomer ICLR 2022 Scenic: A JAX Library for Computer Vision Research and Beyond CVPR 2022 Learning With Neighbor Consistency for Noisy Labels CVPR 2022 Multiview Transformers for Video Recognition CVPR 2022 ViViT: A Video Vision Transformer ICCV 2021 Unified Graph Structured Models for Video Understanding ICCV 2021 TokenLearner: Adaptive Space-Time Tokenization for Videos NIPS 2021 Attention Bottlenecks for Multimodal Fusion NIPS 2021 Compressive Visual Representations NIPS 2021 Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed Videos ECCV 2020 Dynamic Graph Message Passing Networks CVPR 2020 Exploiting Temporal Context for 3D Human Pose Estimation in the Wild CVPR 2019 Deep Fully-Connected Part-Based Models for Human Pose Estimation ACML 2018 On the Robustness of Semantic Segmentation Models to Adversarial Attacks CVPR 2018 Weakly- and Semi-Supervised Panoptic Segmentation ECCV 2018 Pixelwise Instance Segmentation With a Dynamically Instantiated Network CVPR 2017