Fahad Shahbaz Khan

149 papers · 2013–2026 · 15 conferences · across top CS/AI conferences

Achievements

+17 more ↓

🌍 Conference Polyglot (14) 🏃 Academic Marathon (13) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (10)

🧭 Keyword Pioneer 🐝 Cross-Pollinator (10) 🌍 Conference Polyglot (14) 🏠 Conference Loyalist (58) 🤝 Dynamic Duo (81) 🏆 Grand Slam 👥 Mega-Team (69) 🔬 Deep Specialist (25) 🧬 Topic Evolution 🏆 Keyword Champion (6) ❓ The Questioner 📈 Trend Setter 🗃️ Keyword Collector (498) 🔥 Unstoppable (14) ⚡ Prolific Year (14) 💎 Century Club (144) 🚀 Conference Pioneer

Conferences

CVPR (58) ICCV (26) ECCV (14) NIPS (13) WACV (7) ACL (6) ICLR (5) AAAI (4) EMNLP (4) MICCAI (4) ICML (3) NAACL (2) COLING (1) INTERSPEECH (1) MIDL (1)

Top co-authors

Salman Khan (85) Rao Muhammad Anwer (40) Hisham Cholakkal (36) Ling Shao (21) Mubarak Shah (21) Muzammal Naseer (19) Michael Felsberg (16) Martin Danelljan (14) Ming-Hsuan Yang (14) Munawar Hayat (11)

Research topics

Architectures (1) Techniques (1)

Keywords

object detection (18) vision-language model (11) convolutional neural network (10) zero-shot learning (10) semantic segmentation (10) multimodal learning (10) self-supervised learning (8) representation learning (7) visual tracking (7) transfer learning (7) few-shot learning (7) prompt learning (6) image denoising (6) incremental learning (6) large language model (6) object tracking (6) image restoration (5) domain adaptation (5) video understanding (5) continual learning (5)

Papers

Bring Your Dreams to Life: Continual Text-to-Video Customization AAAI 2026 AURORA: Augmented Understanding via Structured Reasoning and Reinforcement Learning for Reference Audio-Visual Segmentation AAAI 2026 Paper Circle: An Open-source Multi-agent Research Discovery and Analysis Framework ACL 2026 GCA Framework: A GCC Countries–Grounded Dataset and Agentic Pipeline for Climate Decision Support ACL 2026 MedROV: Towards Real-Time Open-Vocabulary Detection Across Diverse Medical Imaging Modalities WACV 2026 A Multi-Agent Diffusion Approach for MRI Anomaly Segmentation via Modality-Specific LoRA Specialization WACV 2026 BiMediX2 : Bio-Medical EXpert LMM for Diverse Medical Modalities EMNLP 2025 VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs NAACL 2025 CAMEL-Bench: A Comprehensive Arabic LMM Benchmark NAACL 2025 Hierarchical Self-Supervised Adversarial Training for Robust Vision Models in Histopathology MICCAI 2025 GeoPixel: Pixel Grounding Large Multimodal Model in Remote Sensing ICML 2025 GenZSL: Generative Zero-Shot Learning Via Inductive Variational Autoencoder ICML 2025 $InterLCM$: Low-Quality Images as Intermediate States of Latent Consistency Models for Effective Blind Face Restoration ICLR 2025 One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt ICLR 2025 ZeroDiff: Solidified Visual-semantic Correlation in Zero-Shot Learning ICLR 2025 Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation ICLR 2025 AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and Modulation ICLR 2025 Beyond Simple Edits: Composed Video Retrieval with Dense Modifications ICCV 2025 TAViS: Text-bridged Audio-Visual Segmentation with Foundation Models ICCV 2025 ALOcc: Adaptive Lifting-Based 3D Semantic Occupancy and Cost Volume-Based Flow Predictions ICCV 2025 RAGNet: Large-scale Reasoning-based Affordance Segmentation Benchmark towards General Grasping ICCV 2025 Interpretable Zero-Shot Learning with Locally-Aligned Vision-Language Model ICCV 2025 GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks ICCV 2025 LawDIS: Language-Window-based Controllable Dichotomous Image Segmentation ICCV 2025 Hierarchical Visual Prompt Learning for Continual Video Instance Segmentation ICCV 2025 MAviS: A Multimodal Conversational Assistant For Avian Species EMNLP 2025 A Culturally-diverse Multilingual Multimodal Video Benchmark & Model EMNLP 2025 EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues CVPR 2025 Real-time Breast Lesion Detection in Videos via Spatial-temporal Feature Aggregation MIDL 2025 All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages CVPR 2025 One-Way Ticket: Time-Independent Unified Encoder for Distilling Text-to-Image Diffusion Models CVPR 2025 VideoGLaMM : A Large Multimodal Model for Pixel-Level Visual Grounding in Videos CVPR 2025 GroupMamba: Efficient Group-Based Visual State Space Model CVPR 2025 LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM ACL 2025 KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding ACL 2025 Time Travel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts ACL 2025 LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs ACL 2025 AgriCLIP: Adapting CLIP for Agriculture and Livestock via Domain-Specialized Cross-Model Alignment COLING 2025 Efficient Video Object Segmentation via Modulated Cross-Attention Memory WACV 2025 Enhancing Novel Object Detection via Cooperative Foundational Models WACV 2025 Visual-Augmented Dynamic Semantic Prototype for Generative Zero-Shot Learning CVPR 2024 Composed Video Retrieval via Enriched Context and Discriminative Embeddings CVPR 2024 GeoChat: Grounded Large Vision-Language Model for Remote Sensing CVPR 2024 Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning CVPR 2024 DB-SAM: Delving into High Quality Universal Medical Image Segmentation MICCAI 2024 BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning MICCAI 2024 Faster Diffusion: Rethinking the Role of the Encoder for Diffusion Model Inference NIPS 2024 How to Continually Adapt Text-to-Image Diffusion Models for Flexible Customization? NIPS 2024 Bidirectional Reciprocative Information Communication for Few-Shot Semantic Segmentation ICML 2024 VideoGrounding-DINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding CVPR 2024 S3A: Towards Realistic Zero-Shot Classification via Self Structural Semantic Alignment AAAI 2024 Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis NIPS 2024 Semi-supervised Open-World Object Detection AAAI 2024 Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors CVPR 2024 SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation CVPR 2024 Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery CVPR 2024 MaskFactory: Towards High-quality Synthetic Data Generation for Dichotomous Image Segmentation NIPS 2024 BiMediX: Bilingual Medical Mixture of Experts LLM EMNLP 2024 Continual Learning and Unknown Object Discovery in 3D Scenes via Self-Distillation ECCV 2024 CONDA: Condensed Deep Association Learning for Co-Salient Object Detection. ECCV 2024 Learning Camouflaged Object Detection from Noisy Pseudo Label ECCV 2024 Hierarchical Text-to-Vision Self Supervised Alignment for Improved Histopathology Representation Learning MICCAI 2024 Multi-grained Temporal Prototype Learning for Few-shot Video Object Segmentation ICCV 2023 3D Indoor Instance Segmentation in an Open-World NIPS 2023 PromptIR: Prompting for All-in-One Image Restoration NIPS 2023 Cal-DETR: Calibrated Detection Transformer NIPS 2023 Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization NIPS 2023 PromptCAL: Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel Category Discovery CVPR 2023 Burstormer: Burst Image Restoration and Enhancement Transformer CVPR 2023 Discriminative Co-Saliency and Background Mining Transformer for Co-Salient Object Detection CVPR 2023 Person Image Synthesis via Denoising Diffusion Model CVPR 2023 Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection CVPR 2023 3D-Aware Multi-Class Image-to-Image Translation With NeRFs CVPR 2023 MaPLe: Multi-Modal Prompt Learning CVPR 2023 Vita-CLIP: Video and Text Adaptive CLIP via Multimodal Prompting CVPR 2023 Fine-Tuned CLIP Models Are Efficient Video Learners CVPR 2023 Gated Multi-Resolution Transfer Network for Burst Restoration and Enhancement CVPR 2023 Self-regulating Prompts: Foundational Model Adaptation without Forgetting ICCV 2023 Generative Multiplane Neural Radiance for 3D-Aware Image Generation ICCV 2023 SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications ICCV 2023 Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition ICCV 2023 3D Instance Segmentation via Enhanced Spatial and Semantic Supervision ICCV 2023 Multimodal Multi-Head Convolutional Attention With Various Kernel Sizes for Medical Image Super-Resolution WACV 2023 SAT: Scale-Augmented Transformer for Person Search WACV 2023 Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection CVPR 2022 DoodleFormer: Creative Sketch Drawing with Transformers ECCV 2022 Dense Gaussian Processes for Few-Shot Segmentation ECCV 2022 Video Instance Segmentation via Multi-Scale Spatio-Temporal Split Attention Transformer ECCV 2022 OpenLDN: Learning to Discover Novel Classes for Open-World Semi-Supervised Learning ECCV 2022 Self-Supervised Video Transformer CVPR 2022 Spatio-Temporal Relation Modeling for Few-Shot Action Recognition CVPR 2022 Energy-Based Latent Aligner for Incremental Learning CVPR 2022 PSTR: End-to-End One-Step Person Search With Transformers CVPR 2022 Restormer: Efficient Transformer for High-Resolution Image Restoration CVPR 2022 Burst Image Restoration and Enhancement CVPR 2022 OW-DETR: Open-World Detection Transformer CVPR 2022 UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection CVPR 2022 COCOA: Context-Conditional Adaptation for Recognizing Unseen Classes in Unseen Domains WACV 2022 SepTr: Separable Transformer for Audio Spectrogram Processing INTERSPEECH 2022 Class-Agnostic Object Detection with Multi-modal Transformer ECCV 2022 Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection NIPS 2022 An Investigation into Whitening Loss for Self-supervised Learning NIPS 2022 Exploring Complementary Strengths of Invariant and Equivariant Representations for Few-Shot Learning CVPR 2021 Orthogonal Projection Loss ICCV 2021 Discriminative Region-Based Multi-Label Zero-Shot Learning ICCV 2021 Handwriting Transformers ICCV 2021 On Generating Transferable Targeted Perturbations ICCV 2021 D2-Net: Weakly-Supervised Action Localization via Discriminative Embeddings and Denoised Activations ICCV 2021 Intriguing Properties of Vision Transformers NIPS 2021 Multi-Stage Progressive Image Restoration CVPR 2021 Learning To Fuse Asymmetric Feature Maps in Siamese Trackers CVPR 2021 Anomaly Detection in Video via Self-Supervised and Multi-Task Learning CVPR 2021 Towards Open World Object Detection CVPR 2021 AnimalWeb: A Large-Scale Hierarchical Dataset of Annotated Animal Faces CVPR 2020 iTAML: An Incremental Task-Agnostic Meta-learning Approach CVPR 2020 D2Det: Towards High Quality Object Detection and Instance Segmentation CVPR 2020 Learning Fast and Robust Target Models for Video Object Segmentation CVPR 2020 MineGAN: Effective Knowledge Transfer From GANs to Target Domains With Few Images CVPR 2020 A Self-supervised Approach for Adversarial Robustness CVPR 2020 CycleISP: Real Image Restoration via Improved Data Synthesis CVPR 2020 Learning Human-Object Interaction Detection Using Interaction Points CVPR 2020 Semi-Supervised Learning for Few-Shot Image-to-Image Translation CVPR 2020 SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation ECCV 2020 Count- and Similarity-aware R-CNN for Pedestrian Detection ECCV 2020 Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification ECCV 2020 Fixing Localization Errors to Improve Image Classification ECCV 2020 Learning Enriched Features for Real Image Restoration and Enhancement ECCV 2020 Cross-Domain Transferability of Adversarial Perturbations NIPS 2019 Enriched Feature Guided Refinement Network for Object Detection ICCV 2019 3C-Net: Category Count and Center Loss for Weakly-Supervised Action Localization ICCV 2019 Learning the Model Update for Siamese Trackers ICCV 2019 Learning Rich Features at High-Speed for Single-Shot Object Detection ICCV 2019 Deep Contextual Attention for Human-Object Interaction Detection ICCV 2019 Object Counting and Instance Segmentation With Image-Level Supervision CVPR 2019 Mask-Guided Attention Network for Occluded Pedestrian Detection ICCV 2019 Out-Of-Distribution Detection for Generalized Zero-Shot Action Recognition CVPR 2019 A Generative Appearance Model for End-To-End Video Object Segmentation CVPR 2019 Object-Centric Auto-Encoders and Dummy Anomalies for Abnormal Event Detection in Video CVPR 2019 Efficient Featurized Image Pyramid Network for Single Shot Detector CVPR 2019 ATOM: Accurate Tracking by Overlap Maximization CVPR 2019 Random Path Selection for Continual Learning NIPS 2019 Unveiling the Power of Deep Tracking ECCV 2018 Density Adaptive Point Set Registration CVPR 2018 ECO: Efficient Convolution Operators for Tracking CVPR 2017 A Probabilistic Framework for Color-Based Point Set Registration CVPR 2016 Adaptive Decontamination of the Training Set: A Unified Formulation for Discriminative Visual Tracking CVPR 2016 Learning Spatially Regularized Correlation Filters for Visual Tracking ICCV 2015 Adaptive Color Attributes for Real-Time Visual Tracking CVPR 2014 Discriminative Color Descriptors CVPR 2013