Mubarak Shah
133 papers · 2013–2026 · 11 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+18 more ↓ Show less ↑
π Conference Polyglot (10) π Academic Marathon (13) π§ Keyword Pioneer π Interdisciplinary Bridge π Cross-Pollinator (10)
π§
Keyword Pioneer
π
Cross-Pollinator
(10)
π
Conference Polyglot
(10)
π
Conference Loyalist
(24)
π€
Dynamic Duo
(21)
π
Grand Slam
π₯
Mega-Team
(69)
π±
Topic Pioneer
π¬
Deep Specialist
(18)
π§¬
Topic Evolution
π
Keyword Champion
(4)
β
The Questioner
(5)
π
Trend Setter
π₯
Unstoppable
(14)
ποΈ
Keyword Collector
(442)
π
Century Club
(129)
β‘
Prolific Year
(6)
π
Conference Pioneer
Conferences
CVPR (60)
ICCV (24)
ECCV (18)
ICLR (9)
AAAI (7)
NIPS (6)
WACV (3)
ACL (2)
EMNLP (2)
EACL (1)
ICML (1)
Top co-authors
Keywords
video understanding
(17)
object detection
(9)
self-supervised learning
(9)
contrastive learning
(8)
zero-shot learning
(7)
representation learning
(6)
attention mechanism
(6)
semi-supervised learning
(6)
action recognition
(6)
convolutional neural network
(5)
curriculum learning
(5)
video anomaly detection
(4)
video analysis
(4)
multimodal learning
(4)
person re-identification
(4)
semantic segmentation
(4)
weakly supervised learning
(4)
multi-label classification
(4)
video retrieval
(4)
large multimodal model
(4)
Papers
GAEA: A Geolocation Aware Conversational Assistant
WACV 2026
SafeR-CLIP: Mitigating NSFW Content in Vision-Language Models While Preserving Pre-Trained Knowledge
AAAI 2026
SMPRO: Self-Supervised Visual Preference Alignment via Differentiable Multi-Preference Multi-Group Ranking
AAAI 2026
Jailbreaks as Inference-Time Alignment: A Framework for Understanding Safety Failures in LLMs
EACL 2026
ViLL-E: Video LLM Embeddings for Retrieval
ACL 2026
Beyond Simple Edits: Composed Video Retrieval with Dense Modifications
ICCV 2025
M-LLM Based Video Frame Selection for Efficient Video Understanding
CVPR 2025
CoLLM: A Large Language Model for Composed Image Retrieval
CVPR 2025
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
CVPR 2025
Enhancing Privacy-Utility Trade-offs to Mitigate Memorization in Diffusion Models
CVPR 2025
Curriculum Direct Preference Optimization for Diffusion and Consistency Models
CVPR 2025
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
ACL 2025
DLCR: A Generative Data Expansion Framework via Diffusion for Clothes-Changing Person Re-ID
WACV 2025
MGD$^3$ : Mode-Guided Dataset Distillation using Diffusion Models
ICML 2025
Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention
ICLR 2025
ASTrA: Adversarial Self-supervised Training with Adaptive-Attacks
ICLR 2025
ALBAR: Adversarial Learning approach to mitigate Biases in Action Recognition
ICLR 2025
AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and Modulation
ICLR 2025
Exploring Local Memorization in Diffusion Models via Bright Ending Attention
ICLR 2025
A Culturally-diverse Multilingual Multimodal Video Benchmark & Model
EMNLP 2025
Test-Time Retrieval-Augmented Adaptation for Vision-Language Models
ICCV 2025
Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning
ICCV 2025
GT-Loc: Unifying When and Where in Images Through a Joint Embedding Space
ICCV 2025
No More Shortcuts: Realizing the Potential of Temporal Self-Supervision
AAAI 2024
FinePseudo: Improving Pseudo-Labelling through Temporal-Alignablity for Semi-Supervised Fine-Grained Action Recognition
ECCV 2024
Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets
ECCV 2024
X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs
ECCV 2024
SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding
ECCV 2024
DVANet: Disentangling View and Action Features for Multi-View Action Recognition
AAAI 2024
MΓΆbius Transform for Mitigating Perspective Distortions in Representation Learning
ECCV 2024
CityGuessr: City-Level Video Geo-Localization on a Global Scale
ECCV 2024
GAReT: Cross-view Video Geolocalization with Adapters and Auto-Regressive Transformers
ECCV 2024
Regulating Model Reliance on Non-Robust Features by Smoothing Input Marginal Density
ECCV 2024
Open Vocabulary Multi-Label Video Classification
ECCV 2024
PTQ4DiT: Post-training Quantization for Diffusion Transformers
NIPS 2024
Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors
CVPR 2024
VidLA: Video-Language Alignment at Scale
CVPR 2024
Composed Video Retrieval via Enriched Context and Discriminative Embeddings
CVPR 2024
Multiview Aerial Visual RECognition (MAVREC): Can Multi-view Improve Aerial Visual Perception?
CVPR 2024
Where We Are and What We're Looking At: Query Based Worldwide Image Geo-Localization Using Hierarchies and Scenes
CVPR 2023
TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition
CVPR 2023
R2Former: Unified Retrieval and Reranking Transformer for Place Recognition
CVPR 2023
Vita-CLIP: Video and Text Adaptive CLIP via Multimodal Prompting
CVPR 2023
Efficient Distribution Similarity Identification in Clustered Federated Learning via Principal Angles between Client Data Subspaces
AAAI 2023
Diffusion Action Segmentation
ICCV 2023
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
ICCV 2023
When Do Curricula Work in Federated Learning?
ICCV 2023
Preserving Modality Structure Improves Multi-Modal Learning
ICCV 2023
TeD-SPAD: Temporal Distinctiveness for Self-Supervised Privacy-Preservation for Video Anomaly Detection
ICCV 2023
CDFSL-V: Cross-Domain Few-Shot Learning for Videos
ICCV 2023
Dual Student Networks for Data-Free Model Stealing
ICLR 2023
Learning Situation Hyper-Graphs for Video Question Answering
CVPR 2023
GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization
NIPS 2023
Contrastive Self-Supervised Learning Leads to Higher Adversarial Susceptibility
AAAI 2023
Re-calibrating Feature Attributions for Model Interpretation
ICLR 2023
PivoTAL: Prior-Driven Supervision for Weakly-Supervised Temporal Action Localization
CVPR 2023
Person Image Synthesis via Denoising Diffusion Model
CVPR 2023
Class Prototypes Based Contrastive Learning for Classifying Multi-Label and Fine-Grained Educational Videos
CVPR 2023
Weakly Supervised Grounding for VQA in Vision-Language Transformers
ECCV 2022
OpenLDN: Learning to Discover Novel Classes for Open-World Semi-Supervised Learning
ECCV 2022
GAMa: Cross-view Video Geo-localization
ECCV 2022
Don't Pour Cereal into Coffee: Differentiable Temporal Logic for Temporal Action Segmentation
NIPS 2022
Transferable 3D Adversarial Textures Using End-to-End Optimization
WACV 2022
TransGeo: Transformer Is All You Need for Cross-View Image Geo-Localization
CVPR 2022
Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection
CVPR 2022
SPAct: Self-Supervised Privacy Preservation for Action Recognition
CVPR 2022
UniCon: Combating Label Noise Through Uniform Selection and Contrastive Learning
CVPR 2022
OW-DETR: Open-World Detection Transformer
CVPR 2022
PSTR: End-to-End One-Step Person Search With Transformers
CVPR 2022
UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection
CVPR 2022
Self-Joint Supervised Learning
ICLR 2022
Towards Realistic Semi-Supervised Learning
ECCV 2022
In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label Selection Framework for Semi-Supervised Learning
ICLR 2021
Reformulating Zero-shot Action Recognition for Multi-label Actions
NIPS 2021
Dogfight: Detecting Drones From Drones Videos
CVPR 2021
Exploring Complementary Strengths of Invariant and Equivariant Representations for Few-Shot Learning
CVPR 2021
Out-of-Distribution Detection Using Union of 1-Dimensional Subspaces
CVPR 2021
Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules
CVPR 2021
Anomaly Detection in Video via Self-Supervised and Multi-Task Learning
CVPR 2021
Modeling Multi-Label Action Dependencies for Temporal Action Localization
CVPR 2021
Discriminative Region-Based Multi-Label Zero-Shot Learning
ICCV 2021
Handwriting Transformers
ICCV 2021
Video Geo-Localization Employing Geo-Temporal Feature Learning and GPS Trajectory Smoothing
ICCV 2021
Face Image Retrieval With Attribute Manipulation
ICCV 2021
Visual-Textual Capsule Routing for Text-Based Video Segmentation
CVPR 2020
Select to Better Learn: Fast and Accurate Deep Learning Using Data Selection From Nonlinear Manifolds
CVPR 2020
MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering
EMNLP 2020
Count- and Similarity-aware R-CNN for Pedestrian Detection
ECCV 2020
Simultaneous Detection and Tracking with Motion Modelling for Multiple Object Tracking
ECCV 2020
Multi-view Action Recognition using Cross-view Video Prediction
ECCV 2020
SubSpace Capsule Network
AAAI 2020
iTAML: An Incremental Task-Agnostic Meta-learning Approach
CVPR 2020
CapsuleVOS: Semi-Supervised Video Object Segmentation Using Capsule Routing
ICCV 2019
Bridging the Domain Gap for Ground-to-Aerial Image Matching
ICCV 2019
Pay Attention! - Robustifying a Deep Visuomotor Policy Through Task-Focused Visual Attention
CVPR 2019
Iterative Projection and Matching: Finding Structure-Preserving Representatives and Its Application to Computer Vision
CVPR 2019
Unsupervised Meta-Learning for Few-Shot Image Classification
NIPS 2019
Deep Constrained Dominant Sets for Person Re-Identification
ICCV 2019
Visual Text Correction
ECCV 2018
Human Semantic Parsing for Person Re-Identification
CVPR 2018
Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds
ECCV 2018
VideoCapsuleNet: A Simplified Network for Action Detection
NIPS 2018
ClusterNet: Detecting Small Objects in Large Scenes by Exploiting Spatio-Temporal Information
CVPR 2018
Real-World Anomaly Detection in Surveillance Videos
CVPR 2018
Cross-View Image Matching for Geo-Localization in Urban Environments
CVPR 2017
Generative Adversarial Networks Conditioned by Brain Signals
ICCV 2017
Video Fill in the Blank Using LR/RL LSTMs With Spatial-Temporal Attentions
ICCV 2017
Unsupervised Action Discovery and Localization in Videos
ICCV 2017
Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos
ICCV 2017
Semi Supervised Semantic Segmentation Using Generative Adversarial Network
ICCV 2017
Improving Facial Attribute Prediction Using Semantic Segmentation
CVPR 2017
Deep Learning Human Mind for Automated Visual Classification
CVPR 2017
What If We Do Not Have Multiple Videos of the Same Action? -- Video Action Localization Using Web Images
CVPR 2016
Fast Zero-Shot Image Tagging
CVPR 2016
Scene Labeling Using Sparse Precision Matrix
CVPR 2016
Predicting the Where and What of Actors and Actions Through Online Action Localization
CVPR 2016
GMMCP Tracker: Globally Optimal Generalized Maximum Multi Clique Problem for Multiple Object Tracking
CVPR 2015
Geo-Semantic Segmentation
CVPR 2015
Target Identity-Aware Network Flow for Online Multiple Target Tracking
CVPR 2015
Action Localization in Videos Through Context Walk
ICCV 2015
Human Pose Estimation in Videos
ICCV 2015
Video Classification using Semantic Concept Co-occurrences
CVPR 2014
Recognition of Complex Events: Exploiting Temporal Dynamics between Underlying Concepts
CVPR 2014
Who Do I Look Like? Determining Parent-Offspring Resemblance via Gated Autoencoders
CVPR 2014
NMF-KNN: Image Annotation using Weighted Multi-view Non-negative Matrix Factorization
CVPR 2014
GPS-Tag Refinement using Random Walks with an Adaptive Damping Factor
CVPR 2014
Improving Semantic Concept Detection through the Dictionary of Visually-distinct Elements
CVPR 2014
Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions
CVPR 2013
Face Recognition in Movie Trailers via Mean Sequence Sparse Representation-Based Classification
CVPR 2013
Improving an Object Detector and Extracting Regions Using Superpixels
CVPR 2013
Semi-supervised Learning of Feature Hierarchies for Object Detection in a Video
CVPR 2013
Multi-source Multi-scale Counting in Extremely Dense Crowd Images
CVPR 2013
Spatiotemporal Deformable Part Models for Action Detection
CVPR 2013