Rita Cucchiara
52 papers · 2015–2026 · 10 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+14 more ↓ Show less ↑
π§ Keyword Pioneer πΊοΈ Taxonomy Completionist (10) π Interdisciplinary Bridge π Renaissance Researcher (5) π Conference Polyglot (10)
π
Interdisciplinary Bridge
πΊοΈ
Taxonomy Completionist
(10)
π§
Keyword Pioneer
π€
Dynamic Duo
(24)
π¬
Deep Specialist
(13)
π§¬
Topic Evolution
π
Keyword Champion
(2)
β‘
Prolific Year
(12)
β
The Questioner
(5)
ποΈ
Keyword Collector
(232)
π
Trend Setter
π
Century Club
(52)
π
Conference Pioneer
π₯
Unstoppable
(10)
Conferences
CVPR (18)
ICCV (9)
WACV (7)
ECCV (6)
NIPS (4)
ICLR (3)
IJCAI (2)
ACL (1)
AISTATS (1)
ICML (1)
Top co-authors
Keywords
multimodal large language model
(5)
image captioning
(5)
semantic segmentation
(5)
multimodal learning
(4)
diffusion model
(4)
vision transformer
(4)
vision-language model
(3)
convolutional neural network
(3)
image generation
(3)
self-supervised learning
(3)
handwritten text generation
(3)
autoregressive model
(3)
vision language model
(3)
open-vocabulary segmentation
(2)
visual attention
(2)
gaze prediction
(2)
multi-object tracking
(2)
style transfer
(2)
autoregressive transformer
(2)
video captioning
(2)
Papers
Sketch2Stitch: GANs for Abstract Sketch-Based Dress Synthesis
WACV 2026
Autoregressive Styled Text Image Generation, but Make it Reliable
WACV 2026
Zero-Shot Styled Text Image Generation, but Make It Autoregressive
CVPR 2025
Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering
CVPR 2025
Hyperbolic Safety-Aware Vision-Language Models
CVPR 2025
Semantically Conditioned Prompts for Visual Recognition under Missing Modality Scenarios
WACV 2025
Perceive Query & Reason: Enhancing Video QA with Question-Guided Temporal Queries
WACV 2025
TPP-Gaze: Modelling Gaze Dynamics in Space and Time with Neural Temporal Point Processes
WACV 2025
Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives
IJCAI 2025
A Second-Order Perspective on Model Compositionality and Incremental Learning
ICLR 2025
Diffusion Transformers for Tabular Data Time Series Generation
ICLR 2025
Causal Graphical Models for Vision-Language Compositional Understanding
ICLR 2025
Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation
ICCV 2025
Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction
ICCV 2025
MissRAG: Addressing the Missing Modality Challenge in Multimodal Large Language Models
ICCV 2025
What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models
ICCV 2025
Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval
CVPR 2025
Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models
ECCV 2024
Sharing Key Semantics in Transformer Makes Efficient Image Restoration
NIPS 2024
Personalized Instance-based Navigation Toward User-Specific Objects in Realistic Environments
NIPS 2024
Is Multiple Object Tracking a Matter of Specialization?
NIPS 2024
The Revolution of Multimodal Large Language Models: A Survey
ACL 2024
Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation
CVPR 2024
Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities
ECCV 2024
BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues
ECCV 2024
Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas
ECCV 2024
Trends, Applications, and Challenges in Human Attention Modelling
IJCAI 2024
FOSSIL: Free Open-Vocabulary Semantic Segmentation Through Synthetic References Retrieval
WACV 2024
What's Outside the Intersection? Fine-Grained Error Analysis for Semantic Segmentation Beyond IoU
WACV 2024
Input Perturbation Reduces Exposure Bias in Diffusion Models
ICML 2023
Handwritten Text Generation From Visual Archetypes
CVPR 2023
TrackFlow: Multi-Object tracking with Normalizing Flows
ICCV 2023
Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing
ICCV 2023
With a Little Help from Your Own Past: Prototypical Memory Networks for Image Captioning
ICCV 2023
Masked Jigsaw Puzzle: A Versatile Position Embedding for Vision Transformers
CVPR 2023
Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation
CVPR 2023
How Many Observations Are Enough? Knowledge Distillation for Trajectory Forecasting
CVPR 2022
Maximum Class Separation as Inductive Bias in One Matrix
NIPS 2022
Dress Code: High-Resolution Multi-Category Virtual Try-On
ECCV 2022
MOTSynth: How Can Synthetic Data Help Pedestrian Detection and Tracking?
ICCV 2021
Meshed-Memory Transformer for Image Captioning
CVPR 2020
Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation
CVPR 2020
Conditional Channel Gated Networks for Task-Aware Continual Learning
CVPR 2020
Latent Space Autoregression for Novelty Detection
CVPR 2019
Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions
CVPR 2019
Art2Real: Unfolding the Reality of Artworks via Semantically-Aware Image-To-Image Translation
CVPR 2019
Classifying Signals on Irregular Domains via Convolutional Cluster Pooling
AISTATS 2019
Learning to Detect and Track Visible and Occluded Body Joints in a Virtual World
ECCV 2018
LAMV: Learning to Align and Match Videos With Kernelized Temporal Layers
CVPR 2018
POSEidon: Face-From-Depth for Driver Pose Estimation
CVPR 2017
Hierarchical Boundary-Aware Neural Encoder for Video Captioning
CVPR 2017
Learning to Divide and Conquer for Online Multi-Target Tracking
ICCV 2015