Hisham Cholakkal
53 papers · 2016–2026 · 11 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+13 more ↓ Show less ↑
🐝 Cross-Pollinator (6) 🌍 Conference Polyglot (11) 🌉 Interdisciplinary Bridge 🏃 Academic Marathon (10) 🌈 Renaissance Researcher (8)
🐣
Hot Topic Early Bird
🌍
Conference Polyglot
(11)
🏃
Academic Marathon
(10)
👥
Mega-Team
(69)
🏆
Grand Slam
🤝
Dynamic Duo
(35)
🔬
Deep Specialist
(14)
🧬
Topic Evolution
⚡
Prolific Year
(8)
💎
Century Club
(51)
🚀
Conference Pioneer
🔥
Unstoppable
(8)
🗃️
Keyword Collector
(200)
Conferences
CVPR (9)
ECCV (9)
ICCV (9)
EMNLP (6)
WACV (6)
ACL (5)
AAAI (3)
MICCAI (2)
NIPS (2)
ICLR (1)
ICML (1)
Top co-authors
Keywords
object detection
(8)
large language model
(7)
multimodal learning
(5)
large multimodal model
(5)
instruction tuning
(5)
vision-language model
(4)
domain adaptation
(3)
person re-identification
(3)
instance segmentation
(3)
convolutional neural network
(3)
image classification
(3)
semantic segmentation
(3)
fine-grained classification
(2)
visual reasoning
(2)
benchmark evaluation
(2)
weakly supervised learning
(2)
multilingual nlp
(2)
visual question answering
(2)
image segmentation
(2)
medical imaging
(2)
Papers
Think Before You Segment: An Object-aware Reasoning Agent for Referring Audio-Visual Segmentation
AAAI 2026
MedROV: Towards Real-Time Open-Vocabulary Detection Across Diverse Medical Imaging Modalities
WACV 2026
Paper Circle: An Open-source Multi-agent Research Discovery and Analysis Framework
ACL 2026
Adapting In-Domain Few-Shot Segmentation to New Domains without Source Domain Retraining
ICCV 2025
TAViS: Text-bridged Audio-Visual Segmentation with Foundation Models
ICCV 2025
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM
ACL 2025
Time Travel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts
ACL 2025
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
ACL 2025
AgroGPT: Efficient Agricultural Vision-Language Model with Expert Tuning
WACV 2025
PALO: A Polyglot Large Multimodal Model for 5B People
WACV 2025
MAviS: A Multimodal Conversational Assistant For Avian Species
EMNLP 2025
BiMediX2 : Bio-Medical EXpert LMM for Diverse Medical Modalities
EMNLP 2025
Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation
ICLR 2025
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
CVPR 2025
MedAgentSim: Self-Evolving Multi-Agent Simulations for Realistic Clinical Interactions
MICCAI 2025
A Culturally-diverse Multilingual Multimodal Video Benchmark & Model
EMNLP 2025
Fann or Flop: A Multigenre, Multiera Benchmark for Arabic Poetry Understanding in LLMs
EMNLP 2025
TransRadar: Adaptive-Directional Transformer for Real-Time Multi-View Radar Semantic Segmentation
WACV 2024
DDAM-PS: Diligent Domain Adaptive Mixer for Person Search
WACV 2024
BiMediX: Bilingual Medical Mixture of Experts LLM
EMNLP 2024
Decoupled Training for Semi-supervised Medical Image Segmentation with Worst-Case-Aware Learning
MICCAI 2024
Bidirectional Reciprocative Information Communication for Few-Shot Semantic Segmentation
ICML 2024
CONDA: Condensed Deep Association Learning for Co-Salient Object Detection.
ECCV 2024
PARIS3D: Reasoning-based 3D Part Segmentation Using Large Multimodal Model
ECCV 2024
Continual Learning and Unknown Object Discovery in 3D Scenes via Self-Distillation
ECCV 2024
Efficient 3D-Aware Facial Image Editing via Attribute-Specific Prompt Learning
ECCV 2024
Semi-supervised Open-World Object Detection
AAAI 2024
Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery
CVPR 2024
GLaMM: Pixel Grounding Large Multimodal Model
CVPR 2024
XrayGPT: Chest Radiographs Summarization using Large Medical Vision-Language Models
ACL 2024
Discriminative Co-Saliency and Background Mining Transformer for Co-Salient Object Detection
CVPR 2023
3D Indoor Instance Segmentation in an Open-World
NIPS 2023
Handling Data Heterogeneity via Architectural Design for Federated Visual Recognition
NIPS 2023
Person Image Synthesis via Denoising Diffusion Model
CVPR 2023
Arabic Mini-ClimateGPT : A Climate Change and Sustainability Tailored Arabic LLM
EMNLP 2023
Generative Multiplane Neural Radiance for 3D-Aware Image Generation
ICCV 2023
Multi-grained Temporal Prototype Learning for Few-shot Video Object Segmentation
ICCV 2023
SAT: Scale-Augmented Transformer for Person Search
WACV 2023
PSTR: End-to-End One-Step Person Search With Transformers
CVPR 2022
Video Instance Segmentation via Multi-Scale Spatio-Temporal Split Attention Transformer
ECCV 2022
DoodleFormer: Creative Sketch Drawing with Transformers
ECCV 2022
D2-Net: Weakly-Supervised Action Localization via Discriminative Embeddings and Denoised Activations
ICCV 2021
Handwriting Transformers
ICCV 2021
SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation
ECCV 2020
Fine-Grained Recognition: Accounting for Subtle Differences between Similar Classes
AAAI 2020
D2Det: Towards High Quality Object Detection and Instance Segmentation
CVPR 2020
Fixing Localization Errors to Improve Image Classification
ECCV 2020
Count- and Similarity-aware R-CNN for Pedestrian Detection
ECCV 2020
Object Counting and Instance Segmentation With Image-Level Supervision
CVPR 2019
Enriched Feature Guided Refinement Network for Object Detection
ICCV 2019
3C-Net: Category Count and Center Loss for Weakly-Supervised Action Localization
ICCV 2019
Learning Rich Features at High-Speed for Single-Shot Object Detection
ICCV 2019
Backtracking ScSPM Image Classifier for Weakly Supervised Top-Down Saliency
CVPR 2016