Anurag Arnab
41 papers · 2017–2025 · 8 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+14 more ↓ Show less ↑
🏃 Academic Marathon (8) 🌍 Conference Polyglot (8) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (11)
🧭
Keyword Pioneer
🐝
Cross-Pollinator
(11)
🌍
Conference Polyglot
(8)
🤝
Dynamic Duo
(18)
👥
Mega-Team
(43)
🔬
Deep Specialist
(12)
🧬
Topic Evolution
🏆
Keyword Champion
(2)
❓
The Questioner
(2)
🗃️
Keyword Collector
(188)
🔥
Unstoppable
(9)
⚡
Prolific Year
(11)
💎
Century Club
(41)
🚀
Conference Pioneer
Conferences
CVPR (18)
ICCV (6)
NIPS (6)
ECCV (4)
EMNLP (2)
ICLR (2)
ICML (2)
ACML (1)
Top co-authors
Research topics
Keywords
video understanding
(13)
multimodal learning
(6)
vision-language model
(6)
transformer architecture
(5)
representation learning
(4)
action recognition
(4)
semantic segmentation
(4)
object detection
(4)
vision transformer
(3)
computer vision
(3)
video captioning
(3)
neural network architecture
(2)
message passing
(2)
model scaling
(2)
efficient computing
(2)
contrastive learning
(2)
attention mechanism
(2)
self-supervised learning
(2)
video transformer
(2)
parameter efficiency
(2)
Papers
OVFact: Measuring and Improving Open-Vocabulary Factuality for Long Caption Models
EMNLP 2025
Dense Video Object Captioning from Disjoint Supervision
ICLR 2025
From Image to Video: An Empirical Study of Diffusion Representations
ICCV 2025
Flexible Frame Selection for Efficient Video Reasoning
CVPR 2025
Principles of Visual Tokens for Efficient Video Understanding
ICCV 2025
VIEWS: Entity-Aware News Video Captioning
EMNLP 2024
Optimizing Factorized Encoder Models: Time and Memory Reduction for Scalable and Efficient Action Recognition
ECCV 2024
End-to-End Spatio-Temporal Action Localisation with Video Transformers
CVPR 2024
VicTR: Video-conditioned Text Representations for Activity Recognition
CVPR 2024
CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation
CVPR 2024
Pixel-Aligned Language Model
CVPR 2024
Time- Memory- and Parameter-Efficient Visual Adaptation
CVPR 2024
Streaming Dense Video Captioning
CVPR 2024
On Scaling Up a Multilingual Vision and Language Model
CVPR 2024
Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels
NIPS 2024
Mixture of Nested Experts: Adaptive Processing of Visual Tokens
NIPS 2024
Token Turing Machines
CVPR 2023
Does Visual Pretraining Help End-to-End Reasoning?
NIPS 2023
How Can Objects Help Action Recognition?
CVPR 2023
UnLoc: A Unified Framework for Video Localization Tasks
ICCV 2023
Audiovisual Masked Autoencoders
ICCV 2023
Scaling Vision Transformers to 22 Billion Parameters
ICML 2023
Adaptive Computation with Elastic Input Sequence
ICML 2023
End-to-End Generative Pretraining for Multimodal Video Captioning
CVPR 2022
Simple Open-Vocabulary Object Detection with Vision Transformers
ECCV 2022
The Efficiency Misnomer
ICLR 2022
Scenic: A JAX Library for Computer Vision Research and Beyond
CVPR 2022
Learning With Neighbor Consistency for Noisy Labels
CVPR 2022
Multiview Transformers for Video Recognition
CVPR 2022
ViViT: A Video Vision Transformer
ICCV 2021
Unified Graph Structured Models for Video Understanding
ICCV 2021
TokenLearner: Adaptive Space-Time Tokenization for Videos
NIPS 2021
Attention Bottlenecks for Multimodal Fusion
NIPS 2021
Compressive Visual Representations
NIPS 2021
Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed Videos
ECCV 2020
Dynamic Graph Message Passing Networks
CVPR 2020
Exploiting Temporal Context for 3D Human Pose Estimation in the Wild
CVPR 2019
Deep Fully-Connected Part-Based Models for Human Pose Estimation
ACML 2018
On the Robustness of Semantic Segmentation Models to Adversarial Attacks
CVPR 2018
Weakly- and Semi-Supervised Panoptic Segmentation
ECCV 2018
Pixelwise Instance Segmentation With a Dynamically Instantiated Network
CVPR 2017