Rao Muhammad Anwer
50 papers · 2019–2026 · 13 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+12 more ↓ Show less ↑
🐝 Cross-Pollinator (13) 🏃 Academic Marathon (7) 🌉 Interdisciplinary Bridge 🌍 Conference Polyglot (11) 🌈 Renaissance Researcher (9)
🌍
Conference Polyglot
(11)
🏃
Academic Marathon
(7)
🌈
Renaissance Researcher
(9)
👥
Mega-Team
(69)
🤝
Dynamic Duo
(39)
🔬
Deep Specialist
(14)
🧬
Topic Evolution
⚡
Prolific Year
(15)
💎
Century Club
(47)
🚀
Conference Pioneer
🔥
Unstoppable
(8)
🗃️
Keyword Collector
(187)
Conferences
ICCV (12)
CVPR (10)
ECCV (6)
EMNLP (5)
ACL (4)
WACV (3)
AAAI (2)
ICLR (2)
MICCAI (2)
EACL (1)
ICML (1)
MIDL (1)
NAACL (1)
Top co-authors
Keywords
object detection
(9)
large language model
(8)
multimodal learning
(7)
instruction tuning
(4)
semantic segmentation
(4)
convolutional neural network
(3)
instance segmentation
(3)
benchmark evaluation
(3)
medical imaging
(3)
real-time detection
(3)
large multimodal model
(3)
vision-language model
(3)
zero-shot learning
(2)
few-shot learning
(2)
visual reasoning
(2)
benchmark dataset
(2)
contrastive learning
(2)
video retrieval
(2)
self-supervised learning
(2)
multilingual nlp
(2)
Papers
DuwatBench: Bridging Language and Visual Heritage through an Arabic Calligraphy Benchmark for Multimodal Understanding
EACL 2026
MedROV: Towards Real-Time Open-Vocabulary Detection Across Diverse Medical Imaging Modalities
WACV 2026
Think Before You Segment: An Object-aware Reasoning Agent for Referring Audio-Visual Segmentation
AAAI 2026
Real-time Breast Lesion Detection in Videos via Spatial-temporal Feature Aggregation
MIDL 2025
Fann or Flop: A Multigenre, Multiera Benchmark for Arabic Poetry Understanding in LLMs
EMNLP 2025
MAviS: A Multimodal Conversational Assistant For Avian Species
EMNLP 2025
AgroGPT: Efficient Agricultural Vision-Language Model with Expert Tuning
WACV 2025
CAMEL-Bench: A Comprehensive Arabic LMM Benchmark
NAACL 2025
Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation
ICLR 2025
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM
ACL 2025
Time Travel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts
ACL 2025
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
CVPR 2025
Adapting In-Domain Few-Shot Segmentation to New Domains without Source Domain Retraining
ICCV 2025
All in One: Visual-Description-Guided Unified Point Cloud Segmentation
ICCV 2025
Beyond Simple Edits: Composed Video Retrieval with Dense Modifications
ICCV 2025
RAGNet: Large-scale Reasoning-based Affordance Segmentation Benchmark towards General Grasping
ICCV 2025
TAViS: Text-bridged Audio-Visual Segmentation with Foundation Models
ICCV 2025
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
ACL 2025
BiMediX2 : Bio-Medical EXpert LMM for Diverse Medical Modalities
EMNLP 2025
Semi-supervised Open-World Object Detection
AAAI 2024
XrayGPT: Chest Radiographs Summarization using Large Medical Vision-Language Models
ACL 2024
Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery
CVPR 2024
Composed Video Retrieval via Enriched Context and Discriminative Embeddings
CVPR 2024
Efficient 3D-Aware Facial Image Editing via Attribute-Specific Prompt Learning
ECCV 2024
BiMediX: Bilingual Medical Mixture of Experts LLM
EMNLP 2024
Modulate Your Spectrum in Self-Supervised Learning
ICLR 2024
Bidirectional Reciprocative Information Communication for Few-Shot Semantic Segmentation
ICML 2024
BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning
MICCAI 2024
DB-SAM: Delving into High Quality Universal Medical Image Segmentation
MICCAI 2024
Multi-grained Temporal Prototype Learning for Few-shot Video Object Segmentation
ICCV 2023
SAT: Scale-Augmented Transformer for Person Search
WACV 2023
Discriminative Co-Saliency and Background Mining Transformer for Co-Salient Object Detection
CVPR 2023
Arabic Mini-ClimateGPT : A Climate Change and Sustainability Tailored Arabic LLM
EMNLP 2023
Person Image Synthesis via Denoising Diffusion Model
CVPR 2023
Generative Multiplane Neural Radiance for 3D-Aware Image Generation
ICCV 2023
Spatio-Temporal Relation Modeling for Few-Shot Action Recognition
CVPR 2022
Video Instance Segmentation via Multi-Scale Spatio-Temporal Split Attention Transformer
ECCV 2022
DoodleFormer: Creative Sketch Drawing with Transformers
ECCV 2022
Class-Agnostic Object Detection with Multi-modal Transformer
ECCV 2022
Energy-Based Latent Aligner for Incremental Learning
CVPR 2022
PSTR: End-to-End One-Step Person Search With Transformers
CVPR 2022
Handwriting Transformers
ICCV 2021
D2Det: Towards High Quality Object Detection and Instance Segmentation
CVPR 2020
SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation
ECCV 2020
Count- and Similarity-aware R-CNN for Pedestrian Detection
ECCV 2020
Deep Contextual Attention for Human-Object Interaction Detection
ICCV 2019
Efficient Featurized Image Pyramid Network for Single Shot Detector
CVPR 2019
Learning Rich Features at High-Speed for Single-Shot Object Detection
ICCV 2019
Mask-Guided Attention Network for Occluded Pedestrian Detection
ICCV 2019
Enriched Feature Guided Refinement Network for Object Detection
ICCV 2019