Fahad Shahbaz Khan
149 papers · 2013–2026 · 15 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+17 more ↓ Show less ↑
π Conference Polyglot (14) π Academic Marathon (13) π§ Keyword Pioneer π Interdisciplinary Bridge π Cross-Pollinator (10)
π§
Keyword Pioneer
π
Cross-Pollinator
(10)
π
Conference Polyglot
(14)
π
Conference Loyalist
(58)
π€
Dynamic Duo
(81)
π
Grand Slam
π₯
Mega-Team
(69)
π¬
Deep Specialist
(25)
π§¬
Topic Evolution
π
Keyword Champion
(6)
β
The Questioner
π
Trend Setter
ποΈ
Keyword Collector
(498)
π₯
Unstoppable
(14)
β‘
Prolific Year
(14)
π
Century Club
(144)
π
Conference Pioneer
Conferences
CVPR (58)
ICCV (26)
ECCV (14)
NIPS (13)
WACV (7)
ACL (6)
ICLR (5)
AAAI (4)
EMNLP (4)
MICCAI (4)
ICML (3)
NAACL (2)
COLING (1)
INTERSPEECH (1)
MIDL (1)
Top co-authors
Research topics
Keywords
object detection
(18)
vision-language model
(11)
convolutional neural network
(10)
zero-shot learning
(10)
semantic segmentation
(10)
multimodal learning
(10)
self-supervised learning
(8)
representation learning
(7)
visual tracking
(7)
transfer learning
(7)
few-shot learning
(7)
prompt learning
(6)
image denoising
(6)
incremental learning
(6)
large language model
(6)
object tracking
(6)
image restoration
(5)
domain adaptation
(5)
video understanding
(5)
continual learning
(5)
Papers
Bring Your Dreams to Life: Continual Text-to-Video Customization
AAAI 2026
AURORA: Augmented Understanding via Structured Reasoning and Reinforcement Learning for Reference Audio-Visual Segmentation
AAAI 2026
Paper Circle: An Open-source Multi-agent Research Discovery and Analysis Framework
ACL 2026
GCA Framework: A GCC CountriesβGrounded Dataset and Agentic Pipeline for Climate Decision Support
ACL 2026
MedROV: Towards Real-Time Open-Vocabulary Detection Across Diverse Medical Imaging Modalities
WACV 2026
A Multi-Agent Diffusion Approach for MRI Anomaly Segmentation via Modality-Specific LoRA Specialization
WACV 2026
BiMediX2 : Bio-Medical EXpert LMM for Diverse Medical Modalities
EMNLP 2025
VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs
NAACL 2025
CAMEL-Bench: A Comprehensive Arabic LMM Benchmark
NAACL 2025
Hierarchical Self-Supervised Adversarial Training for Robust Vision Models in Histopathology
MICCAI 2025
GeoPixel: Pixel Grounding Large Multimodal Model in Remote Sensing
ICML 2025
GenZSL: Generative Zero-Shot Learning Via Inductive Variational Autoencoder
ICML 2025
$InterLCM$: Low-Quality Images as Intermediate States of Latent Consistency Models for Effective Blind Face Restoration
ICLR 2025
One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt
ICLR 2025
ZeroDiff: Solidified Visual-semantic Correlation in Zero-Shot Learning
ICLR 2025
Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation
ICLR 2025
AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and Modulation
ICLR 2025
Beyond Simple Edits: Composed Video Retrieval with Dense Modifications
ICCV 2025
TAViS: Text-bridged Audio-Visual Segmentation with Foundation Models
ICCV 2025
ALOcc: Adaptive Lifting-Based 3D Semantic Occupancy and Cost Volume-Based Flow Predictions
ICCV 2025
RAGNet: Large-scale Reasoning-based Affordance Segmentation Benchmark towards General Grasping
ICCV 2025
Interpretable Zero-Shot Learning with Locally-Aligned Vision-Language Model
ICCV 2025
GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks
ICCV 2025
LawDIS: Language-Window-based Controllable Dichotomous Image Segmentation
ICCV 2025
Hierarchical Visual Prompt Learning for Continual Video Instance Segmentation
ICCV 2025
MAviS: A Multimodal Conversational Assistant For Avian Species
EMNLP 2025
A Culturally-diverse Multilingual Multimodal Video Benchmark & Model
EMNLP 2025
EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues
CVPR 2025
Real-time Breast Lesion Detection in Videos via Spatial-temporal Feature Aggregation
MIDL 2025
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
CVPR 2025
One-Way Ticket: Time-Independent Unified Encoder for Distilling Text-to-Image Diffusion Models
CVPR 2025
VideoGLaMM : A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
CVPR 2025
GroupMamba: Efficient Group-Based Visual State Space Model
CVPR 2025
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM
ACL 2025
KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding
ACL 2025
Time Travel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts
ACL 2025
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
ACL 2025
AgriCLIP: Adapting CLIP for Agriculture and Livestock via Domain-Specialized Cross-Model Alignment
COLING 2025
Efficient Video Object Segmentation via Modulated Cross-Attention Memory
WACV 2025
Enhancing Novel Object Detection via Cooperative Foundational Models
WACV 2025
Visual-Augmented Dynamic Semantic Prototype for Generative Zero-Shot Learning
CVPR 2024
Composed Video Retrieval via Enriched Context and Discriminative Embeddings
CVPR 2024
GeoChat: Grounded Large Vision-Language Model for Remote Sensing
CVPR 2024
Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning
CVPR 2024
DB-SAM: Delving into High Quality Universal Medical Image Segmentation
MICCAI 2024
BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning
MICCAI 2024
Faster Diffusion: Rethinking the Role of the Encoder for Diffusion Model Inference
NIPS 2024
How to Continually Adapt Text-to-Image Diffusion Models for Flexible Customization?
NIPS 2024
Bidirectional Reciprocative Information Communication for Few-Shot Semantic Segmentation
ICML 2024
VideoGrounding-DINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding
CVPR 2024
S3A: Towards Realistic Zero-Shot Classification via Self Structural Semantic Alignment
AAAI 2024
Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis
NIPS 2024
Semi-supervised Open-World Object Detection
AAAI 2024
Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors
CVPR 2024
SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation
CVPR 2024
Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery
CVPR 2024
MaskFactory: Towards High-quality Synthetic Data Generation for Dichotomous Image Segmentation
NIPS 2024
BiMediX: Bilingual Medical Mixture of Experts LLM
EMNLP 2024
Continual Learning and Unknown Object Discovery in 3D Scenes via Self-Distillation
ECCV 2024
CONDA: Condensed Deep Association Learning for Co-Salient Object Detection.
ECCV 2024
Learning Camouflaged Object Detection from Noisy Pseudo Label
ECCV 2024
Hierarchical Text-to-Vision Self Supervised Alignment for Improved Histopathology Representation Learning
MICCAI 2024
Multi-grained Temporal Prototype Learning for Few-shot Video Object Segmentation
ICCV 2023
3D Indoor Instance Segmentation in an Open-World
NIPS 2023
PromptIR: Prompting for All-in-One Image Restoration
NIPS 2023
Cal-DETR: Calibrated Detection Transformer
NIPS 2023
Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization
NIPS 2023
PromptCAL: Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel Category Discovery
CVPR 2023
Burstormer: Burst Image Restoration and Enhancement Transformer
CVPR 2023
Discriminative Co-Saliency and Background Mining Transformer for Co-Salient Object Detection
CVPR 2023
Person Image Synthesis via Denoising Diffusion Model
CVPR 2023
Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection
CVPR 2023
3D-Aware Multi-Class Image-to-Image Translation With NeRFs
CVPR 2023
MaPLe: Multi-Modal Prompt Learning
CVPR 2023
Vita-CLIP: Video and Text Adaptive CLIP via Multimodal Prompting
CVPR 2023
Fine-Tuned CLIP Models Are Efficient Video Learners
CVPR 2023
Gated Multi-Resolution Transfer Network for Burst Restoration and Enhancement
CVPR 2023
Self-regulating Prompts: Foundational Model Adaptation without Forgetting
ICCV 2023
Generative Multiplane Neural Radiance for 3D-Aware Image Generation
ICCV 2023
SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications
ICCV 2023
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
ICCV 2023
3D Instance Segmentation via Enhanced Spatial and Semantic Supervision
ICCV 2023
Multimodal Multi-Head Convolutional Attention With Various Kernel Sizes for Medical Image Super-Resolution
WACV 2023
SAT: Scale-Augmented Transformer for Person Search
WACV 2023
Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection
CVPR 2022
DoodleFormer: Creative Sketch Drawing with Transformers
ECCV 2022
Dense Gaussian Processes for Few-Shot Segmentation
ECCV 2022
Video Instance Segmentation via Multi-Scale Spatio-Temporal Split Attention Transformer
ECCV 2022
OpenLDN: Learning to Discover Novel Classes for Open-World Semi-Supervised Learning
ECCV 2022
Self-Supervised Video Transformer
CVPR 2022
Spatio-Temporal Relation Modeling for Few-Shot Action Recognition
CVPR 2022
Energy-Based Latent Aligner for Incremental Learning
CVPR 2022
PSTR: End-to-End One-Step Person Search With Transformers
CVPR 2022
Restormer: Efficient Transformer for High-Resolution Image Restoration
CVPR 2022
Burst Image Restoration and Enhancement
CVPR 2022
OW-DETR: Open-World Detection Transformer
CVPR 2022
UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection
CVPR 2022
COCOA: Context-Conditional Adaptation for Recognizing Unseen Classes in Unseen Domains
WACV 2022
SepTr: Separable Transformer for Audio Spectrogram Processing
INTERSPEECH 2022
Class-Agnostic Object Detection with Multi-modal Transformer
ECCV 2022
Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection
NIPS 2022
An Investigation into Whitening Loss for Self-supervised Learning
NIPS 2022
Exploring Complementary Strengths of Invariant and Equivariant Representations for Few-Shot Learning
CVPR 2021
Orthogonal Projection Loss
ICCV 2021
Discriminative Region-Based Multi-Label Zero-Shot Learning
ICCV 2021
Handwriting Transformers
ICCV 2021
On Generating Transferable Targeted Perturbations
ICCV 2021
D2-Net: Weakly-Supervised Action Localization via Discriminative Embeddings and Denoised Activations
ICCV 2021
Intriguing Properties of Vision Transformers
NIPS 2021
Multi-Stage Progressive Image Restoration
CVPR 2021
Learning To Fuse Asymmetric Feature Maps in Siamese Trackers
CVPR 2021
Anomaly Detection in Video via Self-Supervised and Multi-Task Learning
CVPR 2021
Towards Open World Object Detection
CVPR 2021
AnimalWeb: A Large-Scale Hierarchical Dataset of Annotated Animal Faces
CVPR 2020
iTAML: An Incremental Task-Agnostic Meta-learning Approach
CVPR 2020
D2Det: Towards High Quality Object Detection and Instance Segmentation
CVPR 2020
Learning Fast and Robust Target Models for Video Object Segmentation
CVPR 2020
MineGAN: Effective Knowledge Transfer From GANs to Target Domains With Few Images
CVPR 2020
A Self-supervised Approach for Adversarial Robustness
CVPR 2020
CycleISP: Real Image Restoration via Improved Data Synthesis
CVPR 2020
Learning Human-Object Interaction Detection Using Interaction Points
CVPR 2020
Semi-Supervised Learning for Few-Shot Image-to-Image Translation
CVPR 2020
SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation
ECCV 2020
Count- and Similarity-aware R-CNN for Pedestrian Detection
ECCV 2020
Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification
ECCV 2020
Fixing Localization Errors to Improve Image Classification
ECCV 2020
Learning Enriched Features for Real Image Restoration and Enhancement
ECCV 2020
Cross-Domain Transferability of Adversarial Perturbations
NIPS 2019
Enriched Feature Guided Refinement Network for Object Detection
ICCV 2019
3C-Net: Category Count and Center Loss for Weakly-Supervised Action Localization
ICCV 2019
Learning the Model Update for Siamese Trackers
ICCV 2019
Learning Rich Features at High-Speed for Single-Shot Object Detection
ICCV 2019
Deep Contextual Attention for Human-Object Interaction Detection
ICCV 2019
Object Counting and Instance Segmentation With Image-Level Supervision
CVPR 2019
Mask-Guided Attention Network for Occluded Pedestrian Detection
ICCV 2019
Out-Of-Distribution Detection for Generalized Zero-Shot Action Recognition
CVPR 2019
A Generative Appearance Model for End-To-End Video Object Segmentation
CVPR 2019
Object-Centric Auto-Encoders and Dummy Anomalies for Abnormal Event Detection in Video
CVPR 2019
Efficient Featurized Image Pyramid Network for Single Shot Detector
CVPR 2019
ATOM: Accurate Tracking by Overlap Maximization
CVPR 2019
Random Path Selection for Continual Learning
NIPS 2019
Unveiling the Power of Deep Tracking
ECCV 2018
Density Adaptive Point Set Registration
CVPR 2018
ECO: Efficient Convolution Operators for Tracking
CVPR 2017
A Probabilistic Framework for Color-Based Point Set Registration
CVPR 2016
Adaptive Decontamination of the Training Set: A Unified Formulation for Discriminative Visual Tracking
CVPR 2016
Learning Spatially Regularized Correlation Filters for Visual Tracking
ICCV 2015
Adaptive Color Attributes for Real-Time Visual Tracking
CVPR 2014
Discriminative Color Descriptors
CVPR 2013