Mubarak Shah

133 papers · 2013–2026 · 11 conferences · across top CS/AI conferences

Achievements

+18 more ↓

🌍 Conference Polyglot (10) 🏃 Academic Marathon (13) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (10)

🧭 Keyword Pioneer 🐝 Cross-Pollinator (10) 🌍 Conference Polyglot (10) 🏠 Conference Loyalist (24) 🤝 Dynamic Duo (21) 🏆 Grand Slam 👥 Mega-Team (69) 🌱 Topic Pioneer 🔬 Deep Specialist (18) 🧬 Topic Evolution 🏆 Keyword Champion (4) ❓ The Questioner (5) 📈 Trend Setter 🔥 Unstoppable (14) 🗃️ Keyword Collector (442) 💎 Century Club (129) ⚡ Prolific Year (6) 🚀 Conference Pioneer

Conferences

CVPR (60) ICCV (24) ECCV (18) ICLR (9) AAAI (7) NIPS (6) WACV (3) ACL (2) EMNLP (2) EACL (1) ICML (1)

Top co-authors

Fahad Shahbaz Khan (21) Salman Khan (15) Mamshad Nayeem Rizve (14) Chen Chen (12) Ishan Rajendrakumar Dave (8) Rao Muhammad Anwer (8) Nazanin Rahnavard (7) Son Tran (7) Hisham Cholakkal (7) Kevin Duarte (7)

Keywords

video understanding (17) object detection (9) self-supervised learning (9) contrastive learning (8) zero-shot learning (7) representation learning (6) attention mechanism (6) semi-supervised learning (6) action recognition (6) convolutional neural network (5) curriculum learning (5) video anomaly detection (4) video analysis (4) multimodal learning (4) person re-identification (4) semantic segmentation (4) weakly supervised learning (4) multi-label classification (4) video retrieval (4) large multimodal model (4)

Papers

GAEA: A Geolocation Aware Conversational Assistant WACV 2026 SafeR-CLIP: Mitigating NSFW Content in Vision-Language Models While Preserving Pre-Trained Knowledge AAAI 2026 SMPRO: Self-Supervised Visual Preference Alignment via Differentiable Multi-Preference Multi-Group Ranking AAAI 2026 Jailbreaks as Inference-Time Alignment: A Framework for Understanding Safety Failures in LLMs EACL 2026 ViLL-E: Video LLM Embeddings for Retrieval ACL 2026 Beyond Simple Edits: Composed Video Retrieval with Dense Modifications ICCV 2025 M-LLM Based Video Frame Selection for Efficient Video Understanding CVPR 2025 CoLLM: A Large Language Model for Composed Image Retrieval CVPR 2025 All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages CVPR 2025 Enhancing Privacy-Utility Trade-offs to Mitigate Memorization in Diffusion Models CVPR 2025 Curriculum Direct Preference Optimization for Diffusion and Consistency Models CVPR 2025 LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs ACL 2025 DLCR: A Generative Data Expansion Framework via Diffusion for Clothes-Changing Person Re-ID WACV 2025 MGD$^3$ : Mode-Guided Dataset Distillation using Diffusion Models ICML 2025 Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention ICLR 2025 ASTrA: Adversarial Self-supervised Training with Adaptive-Attacks ICLR 2025 ALBAR: Adversarial Learning approach to mitigate Biases in Action Recognition ICLR 2025 AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and Modulation ICLR 2025 Exploring Local Memorization in Diffusion Models via Bright Ending Attention ICLR 2025 A Culturally-diverse Multilingual Multimodal Video Benchmark & Model EMNLP 2025 Test-Time Retrieval-Augmented Adaptation for Vision-Language Models ICCV 2025 Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning ICCV 2025 GT-Loc: Unifying When and Where in Images Through a Joint Embedding Space ICCV 2025 No More Shortcuts: Realizing the Potential of Temporal Self-Supervision AAAI 2024 FinePseudo: Improving Pseudo-Labelling through Temporal-Alignablity for Semi-Supervised Fine-Grained Action Recognition ECCV 2024 Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets ECCV 2024 X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs ECCV 2024 SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding ECCV 2024 DVANet: Disentangling View and Action Features for Multi-View Action Recognition AAAI 2024 Möbius Transform for Mitigating Perspective Distortions in Representation Learning ECCV 2024 CityGuessr: City-Level Video Geo-Localization on a Global Scale ECCV 2024 GAReT: Cross-view Video Geolocalization with Adapters and Auto-Regressive Transformers ECCV 2024 Regulating Model Reliance on Non-Robust Features by Smoothing Input Marginal Density ECCV 2024 Open Vocabulary Multi-Label Video Classification ECCV 2024 PTQ4DiT: Post-training Quantization for Diffusion Transformers NIPS 2024 Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors CVPR 2024 VidLA: Video-Language Alignment at Scale CVPR 2024 Composed Video Retrieval via Enriched Context and Discriminative Embeddings CVPR 2024 Multiview Aerial Visual RECognition (MAVREC): Can Multi-view Improve Aerial Visual Perception? CVPR 2024 Where We Are and What We're Looking At: Query Based Worldwide Image Geo-Localization Using Hierarchies and Scenes CVPR 2023 TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition CVPR 2023 R2Former: Unified Retrieval and Reranking Transformer for Place Recognition CVPR 2023 Vita-CLIP: Video and Text Adaptive CLIP via Multimodal Prompting CVPR 2023 Efficient Distribution Similarity Identification in Clustered Federated Learning via Principal Angles between Client Data Subspaces AAAI 2023 Diffusion Action Segmentation ICCV 2023 Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition ICCV 2023 When Do Curricula Work in Federated Learning? ICCV 2023 Preserving Modality Structure Improves Multi-Modal Learning ICCV 2023 TeD-SPAD: Temporal Distinctiveness for Self-Supervised Privacy-Preservation for Video Anomaly Detection ICCV 2023 CDFSL-V: Cross-Domain Few-Shot Learning for Videos ICCV 2023 Dual Student Networks for Data-Free Model Stealing ICLR 2023 Learning Situation Hyper-Graphs for Video Question Answering CVPR 2023 GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization NIPS 2023 Contrastive Self-Supervised Learning Leads to Higher Adversarial Susceptibility AAAI 2023 Re-calibrating Feature Attributions for Model Interpretation ICLR 2023 PivoTAL: Prior-Driven Supervision for Weakly-Supervised Temporal Action Localization CVPR 2023 Person Image Synthesis via Denoising Diffusion Model CVPR 2023 Class Prototypes Based Contrastive Learning for Classifying Multi-Label and Fine-Grained Educational Videos CVPR 2023 Weakly Supervised Grounding for VQA in Vision-Language Transformers ECCV 2022 OpenLDN: Learning to Discover Novel Classes for Open-World Semi-Supervised Learning ECCV 2022 GAMa: Cross-view Video Geo-localization ECCV 2022 Don't Pour Cereal into Coffee: Differentiable Temporal Logic for Temporal Action Segmentation NIPS 2022 Transferable 3D Adversarial Textures Using End-to-End Optimization WACV 2022 TransGeo: Transformer Is All You Need for Cross-View Image Geo-Localization CVPR 2022 Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection CVPR 2022 SPAct: Self-Supervised Privacy Preservation for Action Recognition CVPR 2022 UniCon: Combating Label Noise Through Uniform Selection and Contrastive Learning CVPR 2022 OW-DETR: Open-World Detection Transformer CVPR 2022 PSTR: End-to-End One-Step Person Search With Transformers CVPR 2022 UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection CVPR 2022 Self-Joint Supervised Learning ICLR 2022 Towards Realistic Semi-Supervised Learning ECCV 2022 In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label Selection Framework for Semi-Supervised Learning ICLR 2021 Reformulating Zero-shot Action Recognition for Multi-label Actions NIPS 2021 Dogfight: Detecting Drones From Drones Videos CVPR 2021 Exploring Complementary Strengths of Invariant and Equivariant Representations for Few-Shot Learning CVPR 2021 Out-of-Distribution Detection Using Union of 1-Dimensional Subspaces CVPR 2021 Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules CVPR 2021 Anomaly Detection in Video via Self-Supervised and Multi-Task Learning CVPR 2021 Modeling Multi-Label Action Dependencies for Temporal Action Localization CVPR 2021 Discriminative Region-Based Multi-Label Zero-Shot Learning ICCV 2021 Handwriting Transformers ICCV 2021 Video Geo-Localization Employing Geo-Temporal Feature Learning and GPS Trajectory Smoothing ICCV 2021 Face Image Retrieval With Attribute Manipulation ICCV 2021 Visual-Textual Capsule Routing for Text-Based Video Segmentation CVPR 2020 Select to Better Learn: Fast and Accurate Deep Learning Using Data Selection From Nonlinear Manifolds CVPR 2020 MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering EMNLP 2020 Count- and Similarity-aware R-CNN for Pedestrian Detection ECCV 2020 Simultaneous Detection and Tracking with Motion Modelling for Multiple Object Tracking ECCV 2020 Multi-view Action Recognition using Cross-view Video Prediction ECCV 2020 SubSpace Capsule Network AAAI 2020 iTAML: An Incremental Task-Agnostic Meta-learning Approach CVPR 2020 CapsuleVOS: Semi-Supervised Video Object Segmentation Using Capsule Routing ICCV 2019 Bridging the Domain Gap for Ground-to-Aerial Image Matching ICCV 2019 Pay Attention! - Robustifying a Deep Visuomotor Policy Through Task-Focused Visual Attention CVPR 2019 Iterative Projection and Matching: Finding Structure-Preserving Representatives and Its Application to Computer Vision CVPR 2019 Unsupervised Meta-Learning for Few-Shot Image Classification NIPS 2019 Deep Constrained Dominant Sets for Person Re-Identification ICCV 2019 Visual Text Correction ECCV 2018 Human Semantic Parsing for Person Re-Identification CVPR 2018 Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds ECCV 2018 VideoCapsuleNet: A Simplified Network for Action Detection NIPS 2018 ClusterNet: Detecting Small Objects in Large Scenes by Exploiting Spatio-Temporal Information CVPR 2018 Real-World Anomaly Detection in Surveillance Videos CVPR 2018 Cross-View Image Matching for Geo-Localization in Urban Environments CVPR 2017 Generative Adversarial Networks Conditioned by Brain Signals ICCV 2017 Video Fill in the Blank Using LR/RL LSTMs With Spatial-Temporal Attentions ICCV 2017 Unsupervised Action Discovery and Localization in Videos ICCV 2017 Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos ICCV 2017 Semi Supervised Semantic Segmentation Using Generative Adversarial Network ICCV 2017 Improving Facial Attribute Prediction Using Semantic Segmentation CVPR 2017 Deep Learning Human Mind for Automated Visual Classification CVPR 2017 What If We Do Not Have Multiple Videos of the Same Action? -- Video Action Localization Using Web Images CVPR 2016 Fast Zero-Shot Image Tagging CVPR 2016 Scene Labeling Using Sparse Precision Matrix CVPR 2016 Predicting the Where and What of Actors and Actions Through Online Action Localization CVPR 2016 GMMCP Tracker: Globally Optimal Generalized Maximum Multi Clique Problem for Multiple Object Tracking CVPR 2015 Geo-Semantic Segmentation CVPR 2015 Target Identity-Aware Network Flow for Online Multiple Target Tracking CVPR 2015 Action Localization in Videos Through Context Walk ICCV 2015 Human Pose Estimation in Videos ICCV 2015 Video Classification using Semantic Concept Co-occurrences CVPR 2014 Recognition of Complex Events: Exploiting Temporal Dynamics between Underlying Concepts CVPR 2014 Who Do I Look Like? Determining Parent-Offspring Resemblance via Gated Autoencoders CVPR 2014 NMF-KNN: Image Annotation using Weighted Multi-view Non-negative Matrix Factorization CVPR 2014 GPS-Tag Refinement using Random Walks with an Adaptive Damping Factor CVPR 2014 Improving Semantic Concept Detection through the Dictionary of Visually-distinct Elements CVPR 2014 Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions CVPR 2013 Face Recognition in Movie Trailers via Mean Sequence Sparse Representation-Based Classification CVPR 2013 Improving an Object Detector and Extracting Regions Using Superpixels CVPR 2013 Semi-supervised Learning of Feature Hierarchies for Object Detection in a Video CVPR 2013 Multi-source Multi-scale Counting in Extremely Dense Crowd Images CVPR 2013 Spatiotemporal Deformable Part Models for Action Detection CVPR 2013