Salman Khan

123 papers · 2017–2026 · 16 conferences · across top CS/AI conferences

Achievements

+18 more ↓

🐝 Cross-Pollinator (13) 🏃 Academic Marathon (9) 🌍 Conference Polyglot (14) 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (10)

🌈 Renaissance Researcher (10) 🗺️ Taxonomy Completionist (133) 🧭 Keyword Pioneer 🏠 Conference Loyalist (36) 🏆 Grand Slam 🧬 Topic Evolution 🤝 Dynamic Duo (81) 👥 Mega-Team (69) 🏆 Keyword Champion (2) 👑 Triple Crown 🔬 Deep Specialist (27) ❓ The Questioner (2) 📈 Trend Setter ⚡ Prolific Year (18) 🔥 Unstoppable (8) 🚀 Conference Pioneer 💎 Century Club (117) 🗃️ Keyword Collector (408)

Conferences

CVPR (36) ICCV (22) ECCV (10) ACL (9) ICLR (9) EMNLP (8) WACV (8) AAAI (5) MICCAI (5) ICML (3) IJCAI (2) NAACL (2) COLING (1) EACL (1) MIDL (1) NIPS (1)

Top co-authors

Fahad Shahbaz Khan (85) Rao Muhammad Anwer (33) Hisham Cholakkal (31) Muzammal Naseer (23) Mubarak Shah (15) Ming-Hsuan Yang (13) Omkar Thawakar (11) Munawar Hayat (11) Syed Waqas Zamir (10) Fahad Khan (9)

Research topics

Architectures (1) Techniques (1)

Keywords

vision-language model (14) large language model (11) multimodal learning (11) zero-shot learning (10) large multimodal model (7) vision language model (6) convolutional neural network (6) self-supervised learning (6) contrastive learning (5) benchmark evaluation (5) video understanding (5) instruction tuning (5) object detection (5) transfer learning (5) image restoration (5) visual question answering (5) image denoising (5) few-shot learning (5) metric learning (4) incremental learning (4)

Papers

Paper Circle: An Open-source Multi-agent Research Discovery and Analysis Framework ACL 2026 CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark ACL 2026 GCA Framework: A GCC Countries–Grounded Dataset and Agentic Pipeline for Climate Decision Support ACL 2026 Bring Your Dreams to Life: Continual Text-to-Video Customization AAAI 2026 DuwatBench: Bridging Language and Visual Heritage through an Arabic Calligraphy Benchmark for Multimodal Understanding EACL 2026 A Multi-Agent Diffusion Approach for MRI Anomaly Segmentation via Modality-Specific LoRA Specialization WACV 2026 MedROV: Towards Real-Time Open-Vocabulary Detection Across Diverse Medical Imaging Modalities WACV 2026 Real-time Breast Lesion Detection in Videos via Spatial-temporal Feature Aggregation MIDL 2025 VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering AAAI 2025 LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM ACL 2025 KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding ACL 2025 Time Travel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts ACL 2025 LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs ACL 2025 AgriCLIP: Adapting CLIP for Agriculture and Livestock via Domain-Specialized Cross-Model Alignment COLING 2025 All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages CVPR 2025 VideoGLaMM : A Large Multimodal Model for Pixel-Level Visual Grounding in Videos CVPR 2025 O-TPT: Orthogonality Constraints for Calibrating Test-time Prompt Tuning in Vision-Language Models CVPR 2025 GroupMamba: Efficient Group-Based Visual State Space Model CVPR 2025 EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues CVPR 2025 A Culturally-diverse Multilingual Multimodal Video Benchmark & Model EMNLP 2025 Fann or Flop: A Multigenre, Multiera Benchmark for Arabic Poetry Understanding in LLMs EMNLP 2025 MAviS: A Multimodal Conversational Assistant For Avian Species EMNLP 2025 Waste-Bench: A Comprehensive Benchmark for Evaluating VLLMs in Cluttered Environments EMNLP 2025 BiMediX2 : Bio-Medical EXpert LMM for Diverse Medical Modalities EMNLP 2025 Promptception: How Sensitive Are Large Multimodal Models to Prompts? EMNLP 2025 Hierarchical Visual Prompt Learning for Continual Video Instance Segmentation ICCV 2025 LawDIS: Language-Window-based Controllable Dichotomous Image Segmentation ICCV 2025 GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks ICCV 2025 Interpretable Zero-Shot Learning with Locally-Aligned Vision-Language Model ICCV 2025 TAViS: Text-bridged Audio-Visual Segmentation with Foundation Models ICCV 2025 AURELIA: Test-time Reasoning Distillation in Audio-Visual LLMs ICCV 2025 Beyond Simple Edits: Composed Video Retrieval with Dense Modifications ICCV 2025 AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and Modulation ICLR 2025 On the Importance of Language-driven Representation Learning for Heterogeneous Federated Learning ICLR 2025 Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation ICLR 2025 GenZSL: Generative Zero-Shot Learning Via Inductive Variational Autoencoder ICML 2025 GeoPixel: Pixel Grounding Large Multimodal Model in Remote Sensing ICML 2025 Hierarchical Self-Supervised Adversarial Training for Robust Vision Models in Histopathology MICCAI 2025 CAMEL-Bench: A Comprehensive Arabic LMM Benchmark NAACL 2025 VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs NAACL 2025 PALO: A Polyglot Large Multimodal Model for 5B People WACV 2025 Enhancing Novel Object Detection via Cooperative Foundational Models WACV 2025 Efficient Video Object Segmentation via Modulated Cross-Attention Memory WACV 2025 COSNet: A Novel Semantic Segmentation Network using Enhanced Boundaries in Cluttered Scenes WACV 2025 MedContext: Learning Contextual Cues for Efficient Volumetric Medical Segmentation MICCAI 2024 Hierarchical Text-to-Vision Self Supervised Alignment for Improved Histopathology Representation Learning MICCAI 2024 Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning CVPR 2024 Bidirectional Reciprocative Information Communication for Few-Shot Semantic Segmentation ICML 2024 GeoChat: Grounded Large Vision-Language Model for Remote Sensing CVPR 2024 Composed Video Retrieval via Enriched Context and Discriminative Embeddings CVPR 2024 Visual-Augmented Dynamic Semantic Prototype for Generative Zero-Shot Learning CVPR 2024 GLaMM: Pixel Grounding Large Multimodal Model CVPR 2024 Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery CVPR 2024 VideoGrounding-DINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding CVPR 2024 S3A: Towards Realistic Zero-Shot Classification via Self Structural Semantic Alignment AAAI 2024 CONDA: Condensed Deep Association Learning for Co-Salient Object Detection. ECCV 2024 Continual Learning and Unknown Object Discovery in 3D Scenes via Self-Distillation ECCV 2024 Efficient 3D-Aware Facial Image Editing via Attribute-Specific Prompt Learning ECCV 2024 Sentence-level Prompts Benefit Composed Image Retrieval ICLR 2024 BiMediX: Bilingual Medical Mixture of Experts LLM EMNLP 2024 How to Continually Adapt Text-to-Image Diffusion Models for Flexible Customization? NIPS 2024 Modulate Your Spectrum in Self-Supervised Learning ICLR 2024 LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts ICLR 2024 A Hybrid Graph Network for Complex Activity Detection in Video WACV 2024 A New Perspective to Boost Performance Fairness For Medical Federated Learning MICCAI 2024 BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning MICCAI 2024 Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models ACL 2024 XrayGPT: Chest Radiographs Summarization using Large Medical Vision-Language Models ACL 2024 Fine-Tuned CLIP Models Are Efficient Video Learners CVPR 2023 Arabic Mini-ClimateGPT : A Climate Change and Sustainability Tailored Arabic LLM EMNLP 2023 Self-regulating Prompts: Foundational Model Adaptation without Forgetting ICCV 2023 Towards Instance-adaptive Inference for Federated Learning ICCV 2023 Diverse Data Augmentation with Diffusions for Effective Test-time Prompt Tuning ICCV 2023 Generative Multiplane Neural Radiance for 3D-Aware Image Generation ICCV 2023 SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications ICCV 2023 Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition ICCV 2023 Multi-grained Temporal Prototype Learning for Few-shot Video Object Segmentation ICCV 2023 Boosting Adversarial Transferability using Dynamic Cues ICLR 2023 PromptCAL: Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel Category Discovery CVPR 2023 Burstormer: Burst Image Restoration and Enhancement Transformer CVPR 2023 Discriminative Co-Saliency and Background Mining Transformer for Co-Salient Object Detection CVPR 2023 Person Image Synthesis via Denoising Diffusion Model CVPR 2023 Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection CVPR 2023 MaPLe: Multi-Modal Prompt Learning CVPR 2023 Vita-CLIP: Video and Text Adaptive CLIP via Multimodal Prompting CVPR 2023 Gated Multi-Resolution Transfer Network for Burst Restoration and Enhancement CVPR 2023 On Improving Adversarial Transferability of Vision Transformers ICLR 2022 OW-DETR: Open-World Detection Transformer CVPR 2022 Burst Image Restoration and Enhancement CVPR 2022 Restormer: Efficient Transformer for High-Resolution Image Restoration CVPR 2022 Energy-Based Latent Aligner for Incremental Learning CVPR 2022 Spatio-Temporal Relation Modeling for Few-Shot Action Recognition CVPR 2022 Self-Supervised Video Transformer CVPR 2022 OpenLDN: Learning to Discover Novel Classes for Open-World Semi-Supervised Learning ECCV 2022 Vision-based Intention and Trajectory Prediction in Autonomous Vehicles: A Survey IJCAI 2022 Learning Disentanglement with Decoupled Labels for Vision-Language Navigation ECCV 2022 Video Instance Segmentation via Multi-Scale Spatio-Temporal Split Attention Transformer ECCV 2022 DoodleFormer: Creative Sketch Drawing with Transformers ECCV 2022 Class-Agnostic Object Detection with Multi-modal Transformer ECCV 2022 Conditional Generative Modeling via Learning the Latent Space ICLR 2021 Discriminative Region-Based Multi-Label Zero-Shot Learning ICCV 2021 Handwriting Transformers ICCV 2021 On Generating Transferable Targeted Perturbations ICCV 2021 Orthogonal Projection Loss ICCV 2021 Towards Open World Object Detection CVPR 2021 Exploring Complementary Strengths of Invariant and Equivariant Representations for Few-Shot Learning CVPR 2021 Multi-Stage Progressive Image Restoration CVPR 2021 Blended Convolution and Synthesis for Efficient Discrimination of 3D Shapes WACV 2020 Fine-Grained Recognition: Accounting for Subtle Differences between Similar Classes AAAI 2020 Semi-Supervised Learning for Few-Shot Image-to-Image Translation CVPR 2020 CycleISP: Real Image Restoration via Improved Data Synthesis CVPR 2020 A Self-supervised Approach for Adversarial Robustness CVPR 2020 AnimalWeb: A Large-Scale Hierarchical Dataset of Annotated Animal Faces CVPR 2020 iTAML: An Incremental Task-Agnostic Meta-learning Approach CVPR 2020 Learning Enriched Features for Real Image Restoration and Enhancement ECCV 2020 Fixing Localization Errors to Improve Image Classification ECCV 2020 Improved Visual-Semantic Alignment for Zero-Shot Object Detection AAAI 2020 Ground-to-Aerial Image Geo-Localization With a Hard Exemplar Reweighting Triplet Loss ICCV 2019 Gaussian Affinity for Max-Margin Class Imbalanced Learning ICCV 2019 Striking the Right Balance With Uncertainty CVPR 2019 Transductive Learning for Zero-Shot Object Detection ICCV 2019 Adversarial Defense by Restricting the Hidden Space of Deep Neural Networks ICCV 2019 Learning deep structured network for weakly supervised change detection IJCAI 2017