Salman Khan
123 papers · 2017–2026 · 16 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+18 more ↓ Show less ↑
π Cross-Pollinator (13) π Academic Marathon (9) π Conference Polyglot (14) π Interdisciplinary Bridge π Renaissance Researcher (10)
π
Renaissance Researcher
(10)
πΊοΈ
Taxonomy Completionist
(133)
π§
Keyword Pioneer
π
Conference Loyalist
(36)
π
Grand Slam
π§¬
Topic Evolution
π€
Dynamic Duo
(81)
π₯
Mega-Team
(69)
π
Keyword Champion
(2)
π
Triple Crown
π¬
Deep Specialist
(27)
β
The Questioner
(2)
π
Trend Setter
β‘
Prolific Year
(18)
π₯
Unstoppable
(8)
π
Conference Pioneer
π
Century Club
(117)
ποΈ
Keyword Collector
(408)
Conferences
CVPR (36)
ICCV (22)
ECCV (10)
ACL (9)
ICLR (9)
EMNLP (8)
WACV (8)
AAAI (5)
MICCAI (5)
ICML (3)
IJCAI (2)
NAACL (2)
COLING (1)
EACL (1)
MIDL (1)
NIPS (1)
Top co-authors
Research topics
Keywords
vision-language model
(14)
large language model
(11)
multimodal learning
(11)
zero-shot learning
(10)
large multimodal model
(7)
vision language model
(6)
convolutional neural network
(6)
self-supervised learning
(6)
contrastive learning
(5)
benchmark evaluation
(5)
video understanding
(5)
instruction tuning
(5)
object detection
(5)
transfer learning
(5)
image restoration
(5)
visual question answering
(5)
image denoising
(5)
few-shot learning
(5)
metric learning
(4)
incremental learning
(4)
Papers
Paper Circle: An Open-source Multi-agent Research Discovery and Analysis Framework
ACL 2026
CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark
ACL 2026
GCA Framework: A GCC CountriesβGrounded Dataset and Agentic Pipeline for Climate Decision Support
ACL 2026
Bring Your Dreams to Life: Continual Text-to-Video Customization
AAAI 2026
DuwatBench: Bridging Language and Visual Heritage through an Arabic Calligraphy Benchmark for Multimodal Understanding
EACL 2026
A Multi-Agent Diffusion Approach for MRI Anomaly Segmentation via Modality-Specific LoRA Specialization
WACV 2026
MedROV: Towards Real-Time Open-Vocabulary Detection Across Diverse Medical Imaging Modalities
WACV 2026
Real-time Breast Lesion Detection in Videos via Spatial-temporal Feature Aggregation
MIDL 2025
VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering
AAAI 2025
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM
ACL 2025
KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding
ACL 2025
Time Travel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts
ACL 2025
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
ACL 2025
AgriCLIP: Adapting CLIP for Agriculture and Livestock via Domain-Specialized Cross-Model Alignment
COLING 2025
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
CVPR 2025
VideoGLaMM : A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
CVPR 2025
O-TPT: Orthogonality Constraints for Calibrating Test-time Prompt Tuning in Vision-Language Models
CVPR 2025
GroupMamba: Efficient Group-Based Visual State Space Model
CVPR 2025
EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues
CVPR 2025
A Culturally-diverse Multilingual Multimodal Video Benchmark & Model
EMNLP 2025
Fann or Flop: A Multigenre, Multiera Benchmark for Arabic Poetry Understanding in LLMs
EMNLP 2025
MAviS: A Multimodal Conversational Assistant For Avian Species
EMNLP 2025
Waste-Bench: A Comprehensive Benchmark for Evaluating VLLMs in Cluttered Environments
EMNLP 2025
BiMediX2 : Bio-Medical EXpert LMM for Diverse Medical Modalities
EMNLP 2025
Promptception: How Sensitive Are Large Multimodal Models to Prompts?
EMNLP 2025
Hierarchical Visual Prompt Learning for Continual Video Instance Segmentation
ICCV 2025
LawDIS: Language-Window-based Controllable Dichotomous Image Segmentation
ICCV 2025
GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks
ICCV 2025
Interpretable Zero-Shot Learning with Locally-Aligned Vision-Language Model
ICCV 2025
TAViS: Text-bridged Audio-Visual Segmentation with Foundation Models
ICCV 2025
AURELIA: Test-time Reasoning Distillation in Audio-Visual LLMs
ICCV 2025
Beyond Simple Edits: Composed Video Retrieval with Dense Modifications
ICCV 2025
AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and Modulation
ICLR 2025
On the Importance of Language-driven Representation Learning for Heterogeneous Federated Learning
ICLR 2025
Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation
ICLR 2025
GenZSL: Generative Zero-Shot Learning Via Inductive Variational Autoencoder
ICML 2025
GeoPixel: Pixel Grounding Large Multimodal Model in Remote Sensing
ICML 2025
Hierarchical Self-Supervised Adversarial Training for Robust Vision Models in Histopathology
MICCAI 2025
CAMEL-Bench: A Comprehensive Arabic LMM Benchmark
NAACL 2025
VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs
NAACL 2025
PALO: A Polyglot Large Multimodal Model for 5B People
WACV 2025
Enhancing Novel Object Detection via Cooperative Foundational Models
WACV 2025
Efficient Video Object Segmentation via Modulated Cross-Attention Memory
WACV 2025
COSNet: A Novel Semantic Segmentation Network using Enhanced Boundaries in Cluttered Scenes
WACV 2025
MedContext: Learning Contextual Cues for Efficient Volumetric Medical Segmentation
MICCAI 2024
Hierarchical Text-to-Vision Self Supervised Alignment for Improved Histopathology Representation Learning
MICCAI 2024
Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning
CVPR 2024
Bidirectional Reciprocative Information Communication for Few-Shot Semantic Segmentation
ICML 2024
GeoChat: Grounded Large Vision-Language Model for Remote Sensing
CVPR 2024
Composed Video Retrieval via Enriched Context and Discriminative Embeddings
CVPR 2024
Visual-Augmented Dynamic Semantic Prototype for Generative Zero-Shot Learning
CVPR 2024
GLaMM: Pixel Grounding Large Multimodal Model
CVPR 2024
Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery
CVPR 2024
VideoGrounding-DINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding
CVPR 2024
S3A: Towards Realistic Zero-Shot Classification via Self Structural Semantic Alignment
AAAI 2024
CONDA: Condensed Deep Association Learning for Co-Salient Object Detection.
ECCV 2024
Continual Learning and Unknown Object Discovery in 3D Scenes via Self-Distillation
ECCV 2024
Efficient 3D-Aware Facial Image Editing via Attribute-Specific Prompt Learning
ECCV 2024
Sentence-level Prompts Benefit Composed Image Retrieval
ICLR 2024
BiMediX: Bilingual Medical Mixture of Experts LLM
EMNLP 2024
How to Continually Adapt Text-to-Image Diffusion Models for Flexible Customization?
NIPS 2024
Modulate Your Spectrum in Self-Supervised Learning
ICLR 2024
LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts
ICLR 2024
A Hybrid Graph Network for Complex Activity Detection in Video
WACV 2024
A New Perspective to Boost Performance Fairness For Medical Federated Learning
MICCAI 2024
BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning
MICCAI 2024
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
ACL 2024
XrayGPT: Chest Radiographs Summarization using Large Medical Vision-Language Models
ACL 2024
Fine-Tuned CLIP Models Are Efficient Video Learners
CVPR 2023
Arabic Mini-ClimateGPT : A Climate Change and Sustainability Tailored Arabic LLM
EMNLP 2023
Self-regulating Prompts: Foundational Model Adaptation without Forgetting
ICCV 2023
Towards Instance-adaptive Inference for Federated Learning
ICCV 2023
Diverse Data Augmentation with Diffusions for Effective Test-time Prompt Tuning
ICCV 2023
Generative Multiplane Neural Radiance for 3D-Aware Image Generation
ICCV 2023
SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications
ICCV 2023
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
ICCV 2023
Multi-grained Temporal Prototype Learning for Few-shot Video Object Segmentation
ICCV 2023
Boosting Adversarial Transferability using Dynamic Cues
ICLR 2023
PromptCAL: Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel Category Discovery
CVPR 2023
Burstormer: Burst Image Restoration and Enhancement Transformer
CVPR 2023
Discriminative Co-Saliency and Background Mining Transformer for Co-Salient Object Detection
CVPR 2023
Person Image Synthesis via Denoising Diffusion Model
CVPR 2023
Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection
CVPR 2023
MaPLe: Multi-Modal Prompt Learning
CVPR 2023
Vita-CLIP: Video and Text Adaptive CLIP via Multimodal Prompting
CVPR 2023
Gated Multi-Resolution Transfer Network for Burst Restoration and Enhancement
CVPR 2023
On Improving Adversarial Transferability of Vision Transformers
ICLR 2022
OW-DETR: Open-World Detection Transformer
CVPR 2022
Burst Image Restoration and Enhancement
CVPR 2022
Restormer: Efficient Transformer for High-Resolution Image Restoration
CVPR 2022
Energy-Based Latent Aligner for Incremental Learning
CVPR 2022
Spatio-Temporal Relation Modeling for Few-Shot Action Recognition
CVPR 2022
Self-Supervised Video Transformer
CVPR 2022
OpenLDN: Learning to Discover Novel Classes for Open-World Semi-Supervised Learning
ECCV 2022
Vision-based Intention and Trajectory Prediction in Autonomous Vehicles: A Survey
IJCAI 2022
Learning Disentanglement with Decoupled Labels for Vision-Language Navigation
ECCV 2022
Video Instance Segmentation via Multi-Scale Spatio-Temporal Split Attention Transformer
ECCV 2022
DoodleFormer: Creative Sketch Drawing with Transformers
ECCV 2022
Class-Agnostic Object Detection with Multi-modal Transformer
ECCV 2022
Conditional Generative Modeling via Learning the Latent Space
ICLR 2021
Discriminative Region-Based Multi-Label Zero-Shot Learning
ICCV 2021
Handwriting Transformers
ICCV 2021
On Generating Transferable Targeted Perturbations
ICCV 2021
Orthogonal Projection Loss
ICCV 2021
Towards Open World Object Detection
CVPR 2021
Exploring Complementary Strengths of Invariant and Equivariant Representations for Few-Shot Learning
CVPR 2021
Multi-Stage Progressive Image Restoration
CVPR 2021
Blended Convolution and Synthesis for Efficient Discrimination of 3D Shapes
WACV 2020
Fine-Grained Recognition: Accounting for Subtle Differences between Similar Classes
AAAI 2020
Semi-Supervised Learning for Few-Shot Image-to-Image Translation
CVPR 2020
CycleISP: Real Image Restoration via Improved Data Synthesis
CVPR 2020
A Self-supervised Approach for Adversarial Robustness
CVPR 2020
AnimalWeb: A Large-Scale Hierarchical Dataset of Annotated Animal Faces
CVPR 2020
iTAML: An Incremental Task-Agnostic Meta-learning Approach
CVPR 2020
Learning Enriched Features for Real Image Restoration and Enhancement
ECCV 2020
Fixing Localization Errors to Improve Image Classification
ECCV 2020
Improved Visual-Semantic Alignment for Zero-Shot Object Detection
AAAI 2020
Ground-to-Aerial Image Geo-Localization With a Hard Exemplar Reweighting Triplet Loss
ICCV 2019
Gaussian Affinity for Max-Margin Class Imbalanced Learning
ICCV 2019
Striking the Right Balance With Uncertainty
CVPR 2019
Transductive Learning for Zero-Shot Object Detection
ICCV 2019
Adversarial Defense by Restricting the Hidden Space of Deep Neural Networks
ICCV 2019
Learning deep structured network for weakly supervised change detection
IJCAI 2017