Mohamed Elhoseiny
82 papers · 2013–2026 · 12 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+15 more ↓ Show less ↑
π Conference Polyglot (11) π Academic Marathon (13) π§ Keyword Pioneer π Interdisciplinary Bridge π Cross-Pollinator (11)
π§
Keyword Pioneer
π
Cross-Pollinator
(11)
π£
Hot Topic Early Bird
π€
Dynamic Duo
(16)
π
Grand Slam
π¬
Deep Specialist
(18)
π§¬
Topic Evolution
π
Keyword Champion
(2)
β
The Questioner
(2)
π
Trend Setter
ποΈ
Keyword Collector
(272)
β‘
Prolific Year
(16)
π₯
Unstoppable
(12)
π
Century Club
(80)
π
Conference Pioneer
Conferences
CVPR (19)
ICCV (16)
ICLR (12)
ECCV (10)
NIPS (6)
WACV (5)
EMNLP (4)
ICML (4)
AAAI (3)
CORL (1)
EACL (1)
MICCAI (1)
Top co-authors
Keywords
image captioning
(7)
zero-shot learning
(6)
image generation
(5)
generative adversarial network
(5)
object detection
(4)
object recognition
(4)
point cloud
(4)
large language model
(4)
multimodal learning
(4)
graph neural network
(3)
generative model
(3)
convolutional neural network
(3)
multimodal large language model
(3)
emotion recognition
(3)
video understanding
(3)
semantic segmentation
(2)
data augmentation
(2)
knowledge transfer
(2)
knowledge distillation
(2)
text generation
(2)
Papers
M-MiniGPT4: Multilingual VLLM Alignment via Translated Data
EACL 2026
Step-by-step Layered Design Generation
AAAI 2026
iMotion-LLM: Instruction-Conditioned Trajectory Generation
WACV 2026
Sketch2Stitch: GANs for Abstract Sketch-Based Dress Synthesis
WACV 2026
Kestrel: 3D Multimodal LLM for Part-Aware Grounded Description
ICCV 2025
Query-based Knowledge Transfer for Heterogeneous Learning Environments
ICLR 2025
Local Masked Reconstruction for Efficient Self-Supervised Learning on High-Resolution Images
WACV 2025
StoryGPT-V: Large Language Models as Consistent Story Visualizers
CVPR 2025
Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents
CVPR 2025
Temporal Model-Based Federated Active Medical Image Classification
MICCAI 2025
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
ICML 2025
InfiniBench: A Benchmark for Large Multi-Modal Models in Long-Form Movies and TV Shows
EMNLP 2025
Towards AI-Assisted Psychotherapy: Emotion-Guided Generative Interventions
EMNLP 2025
Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models
ICLR 2025
ToddlerDiffusion: Interactive Structured Image Generation with Cascaded SchrΓΆdinger Bridge
ICLR 2025
WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation
ICCV 2025
From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning
ICCV 2025
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding
ICCV 2025
Diffusion-Based Imaginative Coordination for Bimanual Manipulation
ICCV 2025
AURELIA: Test-time Reasoning Distillation in Audio-Visual LLMs
ICCV 2025
AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs
ICCV 2025
Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded Conversations
ECCV 2024
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
ICLR 2024
No Culture Left Behind: ArtELingo-28, a Benchmark of WikiArt with Captions in 28 Languages
EMNLP 2024
Overcoming Generic Knowledge Loss with Selective Parameter Update
CVPR 2024
Label Delay in Online Continual Learning
NIPS 2024
3DCoMPaT200: Language Grounded Large-Scale 3D Vision Dataset for Compositional Recognition
NIPS 2024
CoT3DRef: Chain-of-Thoughts Data-Efficient 3D Visual Grounding
ICLR 2024
ImageCaptioner2: Image Captioner for Image Captioning Bias Amplification Assessment
AAAI 2024
ShapeWalk: Compositional Shape Editing Through Language-Guided Chains
CVPR 2024
Adversarial Text to Continuous Image Generation
CVPR 2024
VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding
NIPS 2024
A Hybrid Graph Network for Complex Activity Detection in Video
WACV 2024
Continual Learning on a Diet: Learning from Sparsely Labeled Streams Under Constrained Computation
ICLR 2024
Uni3DL: A Unified Model for 3D Vision-Language Understanding
ECCV 2024
Goldfish: Vision-Language Understanding of Arbitrarily Long Videos
ECCV 2024
Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time
ECCV 2024
MoStGAN-V: Video Generation With Temporal Motion Styles
CVPR 2023
MammalNet: A Large-Scale Video Benchmark for Mammal Recognition and Behavior Understanding
CVPR 2023
HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image Models
ICCV 2023
Continual Zero-Shot Learning through Semantically Guided Generative Random Walks
ICCV 2023
SLAMB: Accelerated Large Batch Training with Sparse Communication
ICML 2023
OxfordTVG-HIC: Can Machine Make Humorous Captions from Images?
ICCV 2023
Exploring Open-Vocabulary Semantic Segmentation from CLIP Vision Encoder Distillation Only
ICCV 2023
Value Memory Graph: A Graph-Structured World Model for Offline Reinforcement Learning
ICLR 2023
FishNet: A Large-scale Dataset and Benchmark for Fish Recognition, Detection, and Functional Trait Prediction
ICCV 2023
ArtELingo: A Million Emotion Annotations of WikiArt with Emphasis on Diversity over Language and Culture
EMNLP 2022
Social-Implicit: Rethinking Trajectory Prediction Evaluation and the Effectiveness of Implicit Maximum Likelihood Estimation
ECCV 2022
StyleGAN-V: A Continuous Video Generator With the Price, Image Quality and Perks of StyleGAN2
CVPR 2022
Exploring Hierarchical Graph Representation for Large-Scale Zero-Shot Image Classification
ECCV 2022
3DRefTransformer: Fine-Grained Object Identification in Real-World Scenes Using Natural Language
WACV 2022
Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for 3D Visual Grounding
NIPS 2022
3D CoMPaT: Composition of Materials on Parts of 3D Things
ECCV 2022
VisualGPT: Data-Efficient Adaptation of Pretrained Language Models for Image Captioning
CVPR 2022
RelTransformer: A Transformer-Based Long-Tail Visual Relationship Recognition
CVPR 2022
It Is Okay To Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning by Contrastive Data Collection
CVPR 2022
PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies
NIPS 2022
HalentNet: Multimodal Trajectory Forecasting with Hallucinative Intents
ICLR 2021
Motion Forecasting with Unlikelihood Training in Continuous Space
CORL 2021
ArtEmis: Affective Language for Visual Art
CVPR 2021
Adversarial Generation of Continuous Images
CVPR 2021
Exploring Long Tail Visual Relationship Recognition With Large Vocabulary
ICCV 2021
Aligning Latent and Image Spaces To Connect the Unconnectable
ICCV 2021
Class Normalization for (Continual)? Generalized Zero-Shot Learning
ICLR 2021
Temporal Positive-unlabeled Learning for Biomedical Hypothesis Generation via Risk Estimation
NIPS 2020
Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction
CVPR 2020
Uncertainty-guided Continual Learning with Bayesian Neural Networks
ICLR 2020
ReferIt3D: Neural Listeners for Fine-Grained 3D Object Identification in Real-World Scenes
ECCV 2020
Compositional Language Continual Learning
ICLR 2020
Large-Scale Visual Relationship Understanding
AAAI 2019
Efficient Lifelong Learning with A-GEM
ICLR 2019
Creativity Inspired Zero-Shot Learning
ICCV 2019
GDPP: Learning Diverse Generations using Determinantal Point Processes
ICML 2019
A Generative Adversarial Approach for Zero-Shot Learning From Noisy Texts
CVPR 2018
Memory Aware Synapses: Learning what (not) to forget
ECCV 2018
Choose Your Neuron: Incorporating Domain Knowledge through Neuron-Importance
ECCV 2018
Relationship Proposal Networks
CVPR 2017
Link the Head to the "Beak": Zero Shot Learning From Noisy Text Description at Part Precision
CVPR 2017
SPDA-CNN: Unifying Semantic Part Detection and Abstraction for Fine-Grained Recognition
CVPR 2016
A Comparative Analysis and Study of Multiview CNN Models for Joint Object Categorization and Pose Estimation
ICML 2016
Learning Hypergraph-Regularized Attribute Predictors
CVPR 2015
Write a Classifier: Zero-Shot Learning Using Purely Textual Descriptions
ICCV 2013