Rita Cucchiara

52 papers · 2015–2026 · 10 conferences · across top CS/AI conferences

Achievements

+14 more ↓

🧭 Keyword Pioneer 🗺️ Taxonomy Completionist (10) 🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (5) 🌍 Conference Polyglot (10)

🌉 Interdisciplinary Bridge 🗺️ Taxonomy Completionist (10) 🧭 Keyword Pioneer 🤝 Dynamic Duo (24) 🔬 Deep Specialist (13) 🧬 Topic Evolution 🏆 Keyword Champion (2) ⚡ Prolific Year (12) ❓ The Questioner (5) 🗃️ Keyword Collector (232) 📈 Trend Setter 💎 Century Club (52) 🚀 Conference Pioneer 🔥 Unstoppable (10)

Conferences

CVPR (18) ICCV (9) WACV (7) ECCV (6) NIPS (4) ICLR (3) IJCAI (2) ACL (1) AISTATS (1) ICML (1)

Top co-authors

Lorenzo Baraldi (24) Marcella Cornia (24) SIMONE CALDERARA (12) Sara Sarto (7) Angelo Porrello (7) Luca Barsellotti (5) Silvia Cascianelli (4) Roberto Amoroso (4) Matteo Fabbri (4) Federico Cocchi (4)

Keywords

multimodal large language model (5) image captioning (5) semantic segmentation (5) multimodal learning (4) diffusion model (4) vision transformer (4) vision-language model (3) convolutional neural network (3) image generation (3) self-supervised learning (3) handwritten text generation (3) autoregressive model (3) vision language model (3) open-vocabulary segmentation (2) visual attention (2) gaze prediction (2) multi-object tracking (2) style transfer (2) autoregressive transformer (2) video captioning (2)

Papers

Sketch2Stitch: GANs for Abstract Sketch-Based Dress Synthesis WACV 2026 Autoregressive Styled Text Image Generation, but Make it Reliable WACV 2026 Zero-Shot Styled Text Image Generation, but Make It Autoregressive CVPR 2025 Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering CVPR 2025 Hyperbolic Safety-Aware Vision-Language Models CVPR 2025 Semantically Conditioned Prompts for Visual Recognition under Missing Modality Scenarios WACV 2025 Perceive Query & Reason: Enhancing Video QA with Question-Guided Temporal Queries WACV 2025 TPP-Gaze: Modelling Gaze Dynamics in Space and Time with Neural Temporal Point Processes WACV 2025 Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives IJCAI 2025 A Second-Order Perspective on Model Compositionality and Incremental Learning ICLR 2025 Diffusion Transformers for Tabular Data Time Series Generation ICLR 2025 Causal Graphical Models for Vision-Language Compositional Understanding ICLR 2025 Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation ICCV 2025 Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction ICCV 2025 MissRAG: Addressing the Missing Modality Challenge in Multimodal Large Language Models ICCV 2025 What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models ICCV 2025 Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval CVPR 2025 Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models ECCV 2024 Sharing Key Semantics in Transformer Makes Efficient Image Restoration NIPS 2024 Personalized Instance-based Navigation Toward User-Specific Objects in Realistic Environments NIPS 2024 Is Multiple Object Tracking a Matter of Specialization? NIPS 2024 The Revolution of Multimodal Large Language Models: A Survey ACL 2024 Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation CVPR 2024 Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities ECCV 2024 BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues ECCV 2024 Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas ECCV 2024 Trends, Applications, and Challenges in Human Attention Modelling IJCAI 2024 FOSSIL: Free Open-Vocabulary Semantic Segmentation Through Synthetic References Retrieval WACV 2024 What's Outside the Intersection? Fine-Grained Error Analysis for Semantic Segmentation Beyond IoU WACV 2024 Input Perturbation Reduces Exposure Bias in Diffusion Models ICML 2023 Handwritten Text Generation From Visual Archetypes CVPR 2023 TrackFlow: Multi-Object tracking with Normalizing Flows ICCV 2023 Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing ICCV 2023 With a Little Help from Your Own Past: Prototypical Memory Networks for Image Captioning ICCV 2023 Masked Jigsaw Puzzle: A Versatile Position Embedding for Vision Transformers CVPR 2023 Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation CVPR 2023 How Many Observations Are Enough? Knowledge Distillation for Trajectory Forecasting CVPR 2022 Maximum Class Separation as Inductive Bias in One Matrix NIPS 2022 Dress Code: High-Resolution Multi-Category Virtual Try-On ECCV 2022 MOTSynth: How Can Synthetic Data Help Pedestrian Detection and Tracking? ICCV 2021 Meshed-Memory Transformer for Image Captioning CVPR 2020 Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation CVPR 2020 Conditional Channel Gated Networks for Task-Aware Continual Learning CVPR 2020 Latent Space Autoregression for Novelty Detection CVPR 2019 Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions CVPR 2019 Art2Real: Unfolding the Reality of Artworks via Semantically-Aware Image-To-Image Translation CVPR 2019 Classifying Signals on Irregular Domains via Convolutional Cluster Pooling AISTATS 2019 Learning to Detect and Track Visible and Occluded Body Joints in a Virtual World ECCV 2018 LAMV: Learning to Align and Match Videos With Kernelized Temporal Layers CVPR 2018 POSEidon: Face-From-Depth for Driver Pose Estimation CVPR 2017 Hierarchical Boundary-Aware Neural Encoder for Video Captioning CVPR 2017 Learning to Divide and Conquer for Online Multi-Target Tracking ICCV 2015