Mohamed Elhoseiny

82 papers · 2013–2026 · 12 conferences · across top CS/AI conferences

Achievements

+15 more ↓

🌍 Conference Polyglot (11) 🏃 Academic Marathon (13) 🧭 Keyword Pioneer 🌉 Interdisciplinary Bridge 🐝 Cross-Pollinator (11)

🧭 Keyword Pioneer 🐝 Cross-Pollinator (11) 🐣 Hot Topic Early Bird 🤝 Dynamic Duo (16) 🏆 Grand Slam 🔬 Deep Specialist (18) 🧬 Topic Evolution 🏆 Keyword Champion (2) ❓ The Questioner (2) 📈 Trend Setter 🗃️ Keyword Collector (272) ⚡ Prolific Year (16) 🔥 Unstoppable (12) 💎 Century Club (80) 🚀 Conference Pioneer

Conferences

CVPR (19) ICCV (16) ICLR (12) ECCV (10) NIPS (6) WACV (5) EMNLP (4) ICML (4) AAAI (3) CORL (1) EACL (1) MICCAI (1)

Top co-authors

Jun Chen (16) Xiaoqian Shen (11) Ahmed Elgammal (8) Deyao Zhu (8) Jian Ding (7) Ivan Skorokhodov (7) Wenxuan Zhang (7) Xiang Li (6) Youssef Mohamed (6) Faizan Farooq Khan (6)

Keywords

image captioning (7) zero-shot learning (6) image generation (5) generative adversarial network (5) object detection (4) object recognition (4) point cloud (4) large language model (4) multimodal learning (4) graph neural network (3) generative model (3) convolutional neural network (3) multimodal large language model (3) emotion recognition (3) video understanding (3) semantic segmentation (2) data augmentation (2) knowledge transfer (2) knowledge distillation (2) text generation (2)

Papers

M-MiniGPT4: Multilingual VLLM Alignment via Translated Data EACL 2026 Step-by-step Layered Design Generation AAAI 2026 iMotion-LLM: Instruction-Conditioned Trajectory Generation WACV 2026 Sketch2Stitch: GANs for Abstract Sketch-Based Dress Synthesis WACV 2026 Kestrel: 3D Multimodal LLM for Part-Aware Grounded Description ICCV 2025 Query-based Knowledge Transfer for Heterogeneous Learning Environments ICLR 2025 Local Masked Reconstruction for Efficient Self-Supervised Learning on High-Resolution Images WACV 2025 StoryGPT-V: Large Language Models as Consistent Story Visualizers CVPR 2025 Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents CVPR 2025 Temporal Model-Based Federated Active Medical Image Classification MICCAI 2025 LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding ICML 2025 InfiniBench: A Benchmark for Large Multi-Modal Models in Long-Form Movies and TV Shows EMNLP 2025 Towards AI-Assisted Psychotherapy: Emotion-Guided Generative Interventions EMNLP 2025 Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models ICLR 2025 ToddlerDiffusion: Interactive Structured Image Generation with Cascaded Schrödinger Bridge ICLR 2025 WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation ICCV 2025 From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning ICCV 2025 4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding ICCV 2025 Diffusion-Based Imaginative Coordination for Bimanual Manipulation ICCV 2025 AURELIA: Test-time Reasoning Distillation in Audio-Visual LLMs ICCV 2025 AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs ICCV 2025 Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded Conversations ECCV 2024 MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models ICLR 2024 No Culture Left Behind: ArtELingo-28, a Benchmark of WikiArt with Captions in 28 Languages EMNLP 2024 Overcoming Generic Knowledge Loss with Selective Parameter Update CVPR 2024 Label Delay in Online Continual Learning NIPS 2024 3DCoMPaT200: Language Grounded Large-Scale 3D Vision Dataset for Compositional Recognition NIPS 2024 CoT3DRef: Chain-of-Thoughts Data-Efficient 3D Visual Grounding ICLR 2024 ImageCaptioner2: Image Captioner for Image Captioning Bias Amplification Assessment AAAI 2024 ShapeWalk: Compositional Shape Editing Through Language-Guided Chains CVPR 2024 Adversarial Text to Continuous Image Generation CVPR 2024 VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding NIPS 2024 A Hybrid Graph Network for Complex Activity Detection in Video WACV 2024 Continual Learning on a Diet: Learning from Sparsely Labeled Streams Under Constrained Computation ICLR 2024 Uni3DL: A Unified Model for 3D Vision-Language Understanding ECCV 2024 Goldfish: Vision-Language Understanding of Arbitrarily Long Videos ECCV 2024 Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time ECCV 2024 MoStGAN-V: Video Generation With Temporal Motion Styles CVPR 2023 MammalNet: A Large-Scale Video Benchmark for Mammal Recognition and Behavior Understanding CVPR 2023 HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image Models ICCV 2023 Continual Zero-Shot Learning through Semantically Guided Generative Random Walks ICCV 2023 SLAMB: Accelerated Large Batch Training with Sparse Communication ICML 2023 OxfordTVG-HIC: Can Machine Make Humorous Captions from Images? ICCV 2023 Exploring Open-Vocabulary Semantic Segmentation from CLIP Vision Encoder Distillation Only ICCV 2023 Value Memory Graph: A Graph-Structured World Model for Offline Reinforcement Learning ICLR 2023 FishNet: A Large-scale Dataset and Benchmark for Fish Recognition, Detection, and Functional Trait Prediction ICCV 2023 ArtELingo: A Million Emotion Annotations of WikiArt with Emphasis on Diversity over Language and Culture EMNLP 2022 Social-Implicit: Rethinking Trajectory Prediction Evaluation and the Effectiveness of Implicit Maximum Likelihood Estimation ECCV 2022 StyleGAN-V: A Continuous Video Generator With the Price, Image Quality and Perks of StyleGAN2 CVPR 2022 Exploring Hierarchical Graph Representation for Large-Scale Zero-Shot Image Classification ECCV 2022 3DRefTransformer: Fine-Grained Object Identification in Real-World Scenes Using Natural Language WACV 2022 Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for 3D Visual Grounding NIPS 2022 3D CoMPaT: Composition of Materials on Parts of 3D Things ECCV 2022 VisualGPT: Data-Efficient Adaptation of Pretrained Language Models for Image Captioning CVPR 2022 RelTransformer: A Transformer-Based Long-Tail Visual Relationship Recognition CVPR 2022 It Is Okay To Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning by Contrastive Data Collection CVPR 2022 PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies NIPS 2022 HalentNet: Multimodal Trajectory Forecasting with Hallucinative Intents ICLR 2021 Motion Forecasting with Unlikelihood Training in Continuous Space CORL 2021 ArtEmis: Affective Language for Visual Art CVPR 2021 Adversarial Generation of Continuous Images CVPR 2021 Exploring Long Tail Visual Relationship Recognition With Large Vocabulary ICCV 2021 Aligning Latent and Image Spaces To Connect the Unconnectable ICCV 2021 Class Normalization for (Continual)? Generalized Zero-Shot Learning ICLR 2021 Temporal Positive-unlabeled Learning for Biomedical Hypothesis Generation via Risk Estimation NIPS 2020 Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction CVPR 2020 Uncertainty-guided Continual Learning with Bayesian Neural Networks ICLR 2020 ReferIt3D: Neural Listeners for Fine-Grained 3D Object Identification in Real-World Scenes ECCV 2020 Compositional Language Continual Learning ICLR 2020 Large-Scale Visual Relationship Understanding AAAI 2019 Efficient Lifelong Learning with A-GEM ICLR 2019 Creativity Inspired Zero-Shot Learning ICCV 2019 GDPP: Learning Diverse Generations using Determinantal Point Processes ICML 2019 A Generative Adversarial Approach for Zero-Shot Learning From Noisy Texts CVPR 2018 Memory Aware Synapses: Learning what (not) to forget ECCV 2018 Choose Your Neuron: Incorporating Domain Knowledge through Neuron-Importance ECCV 2018 Relationship Proposal Networks CVPR 2017 Link the Head to the "Beak": Zero Shot Learning From Noisy Text Description at Part Precision CVPR 2017 SPDA-CNN: Unifying Semantic Part Detection and Abstraction for Fine-Grained Recognition CVPR 2016 A Comparative Analysis and Study of Multiview CNN Models for Joint Object Categorization and Pose Estimation ICML 2016 Learning Hypergraph-Regularized Attribute Predictors CVPR 2015 Write a Classifier: Zero-Shot Learning Using Purely Textual Descriptions ICCV 2013