Rao Muhammad Anwer

50 papers · 2019–2026 · 13 conferences · across top CS/AI conferences

Achievements

+12 more ↓

🐝 Cross-Pollinator (13) 🏃 Academic Marathon (7) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (11) 🌈 Renaissance Researcher (9)

🌍 Conference Polyglot (11) 🏃 Academic Marathon (7) 🌈 Renaissance Researcher (9) 👥 Mega-Team (69) 🤝 Dynamic Duo (39) 🔬 Deep Specialist (14) 🧬 Topic Evolution ⚡ Prolific Year (15) 💎 Century Club (47) 🚀 Conference Pioneer 🔥 Unstoppable (8) 🗃️ Keyword Collector (187)

Conferences

ICCV (12) CVPR (10) ECCV (6) EMNLP (5) ACL (4) WACV (3) AAAI (2) ICLR (2) MICCAI (2) EACL (1) ICML (1) MIDL (1) NAACL (1)

Top co-authors

Fahad Shahbaz Khan (40) Hisham Cholakkal (34) Salman Khan (33) Omkar Thawakar (10) Yanwei Pang (9) Mubarak Shah (8) Ling Shao (8) Sahal Shaji Mullappilly (7) Jorma Laaksonen (6) Jiale Cao (6)

Keywords

object detection (9) large language model (8) multimodal learning (7) instruction tuning (4) semantic segmentation (4) convolutional neural network (3) instance segmentation (3) benchmark evaluation (3) medical imaging (3) real-time detection (3) large multimodal model (3) vision-language model (3) zero-shot learning (2) few-shot learning (2) visual reasoning (2) benchmark dataset (2) contrastive learning (2) video retrieval (2) self-supervised learning (2) multilingual nlp (2)

Papers

DuwatBench: Bridging Language and Visual Heritage through an Arabic Calligraphy Benchmark for Multimodal Understanding EACL 2026 MedROV: Towards Real-Time Open-Vocabulary Detection Across Diverse Medical Imaging Modalities WACV 2026 Think Before You Segment: An Object-aware Reasoning Agent for Referring Audio-Visual Segmentation AAAI 2026 Real-time Breast Lesion Detection in Videos via Spatial-temporal Feature Aggregation MIDL 2025 Fann or Flop: A Multigenre, Multiera Benchmark for Arabic Poetry Understanding in LLMs EMNLP 2025 MAviS: A Multimodal Conversational Assistant For Avian Species EMNLP 2025 AgroGPT: Efficient Agricultural Vision-Language Model with Expert Tuning WACV 2025 CAMEL-Bench: A Comprehensive Arabic LMM Benchmark NAACL 2025 Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation ICLR 2025 LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM ACL 2025 Time Travel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts ACL 2025 All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages CVPR 2025 Adapting In-Domain Few-Shot Segmentation to New Domains without Source Domain Retraining ICCV 2025 All in One: Visual-Description-Guided Unified Point Cloud Segmentation ICCV 2025 Beyond Simple Edits: Composed Video Retrieval with Dense Modifications ICCV 2025 RAGNet: Large-scale Reasoning-based Affordance Segmentation Benchmark towards General Grasping ICCV 2025 TAViS: Text-bridged Audio-Visual Segmentation with Foundation Models ICCV 2025 LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs ACL 2025 BiMediX2 : Bio-Medical EXpert LMM for Diverse Medical Modalities EMNLP 2025 Semi-supervised Open-World Object Detection AAAI 2024 XrayGPT: Chest Radiographs Summarization using Large Medical Vision-Language Models ACL 2024 Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery CVPR 2024 Composed Video Retrieval via Enriched Context and Discriminative Embeddings CVPR 2024 Efficient 3D-Aware Facial Image Editing via Attribute-Specific Prompt Learning ECCV 2024 BiMediX: Bilingual Medical Mixture of Experts LLM EMNLP 2024 Modulate Your Spectrum in Self-Supervised Learning ICLR 2024 Bidirectional Reciprocative Information Communication for Few-Shot Semantic Segmentation ICML 2024 BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning MICCAI 2024 DB-SAM: Delving into High Quality Universal Medical Image Segmentation MICCAI 2024 Multi-grained Temporal Prototype Learning for Few-shot Video Object Segmentation ICCV 2023 SAT: Scale-Augmented Transformer for Person Search WACV 2023 Discriminative Co-Saliency and Background Mining Transformer for Co-Salient Object Detection CVPR 2023 Arabic Mini-ClimateGPT : A Climate Change and Sustainability Tailored Arabic LLM EMNLP 2023 Person Image Synthesis via Denoising Diffusion Model CVPR 2023 Generative Multiplane Neural Radiance for 3D-Aware Image Generation ICCV 2023 Spatio-Temporal Relation Modeling for Few-Shot Action Recognition CVPR 2022 Video Instance Segmentation via Multi-Scale Spatio-Temporal Split Attention Transformer ECCV 2022 DoodleFormer: Creative Sketch Drawing with Transformers ECCV 2022 Class-Agnostic Object Detection with Multi-modal Transformer ECCV 2022 Energy-Based Latent Aligner for Incremental Learning CVPR 2022 PSTR: End-to-End One-Step Person Search With Transformers CVPR 2022 Handwriting Transformers ICCV 2021 D2Det: Towards High Quality Object Detection and Instance Segmentation CVPR 2020 SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation ECCV 2020 Count- and Similarity-aware R-CNN for Pedestrian Detection ECCV 2020 Deep Contextual Attention for Human-Object Interaction Detection ICCV 2019 Efficient Featurized Image Pyramid Network for Single Shot Detector CVPR 2019 Learning Rich Features at High-Speed for Single-Shot Object Detection ICCV 2019 Mask-Guided Attention Network for Occluded Pedestrian Detection ICCV 2019 Enriched Feature Guided Refinement Network for Object Detection ICCV 2019