Lorenzo Baraldi

25 papers · 2017–2026 · 7 conferences · across top CS/AI conferences

Achievements

+10 more ↓

🌉 Interdisciplinary Bridge 🌈 Renaissance Researcher (6) 🏃 Academic Marathon (9) 🌍 Conference Polyglot (7) 🗺️ Taxonomy Completionist (50)

🏃 Academic Marathon (9) 🌈 Renaissance Researcher (6) 🐝 Cross-Pollinator (15) 🔬 Deep Specialist (12) 🤝 Dynamic Duo (24) 🧬 Topic Evolution 💎 Century Club (25) 🗃️ Keyword Collector (112) ⚡ Prolific Year (8) ❓ The Questioner (2)

Conferences

CVPR (10) WACV (5) ICCV (4) ECCV (3) ACL (1) ICLR (1) NIPS (1)

Top co-authors

Rita Cucchiara (24) Marcella Cornia (17) Sara Sarto (6) Luca Barsellotti (5) Federico Cocchi (4) Roberto Amoroso (4) Costantino Grana (3) Nicholas Moratelli (3) Vittorio Pipoli (3) Federico Bolelli (3)

Keywords

multimodal learning (5) image captioning (4) multimodal large language model (4) vision-language model (4) semantic segmentation (4) self-supervised learning (3) vision language model (3) video captioning (2) missing modality (2) large language model (2) open-vocabulary segmentation (2) visual grounding (2) diffusion model (2) vision transformer (1) visual question answering (1) object recognition (1) metric learning (1) sequence modeling (1) feature extraction (1) domain adaptation (1)

Papers

FG-TRACER: Tracing Information Flow in Multimodal Large Language Models in Free-Form Generation WACV 2026 Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval CVPR 2025 Semantically Conditioned Prompts for Visual Recognition under Missing Modality Scenarios WACV 2025 Perceive Query & Reason: Enhancing Video QA with Question-Guided Temporal Queries WACV 2025 Causal Graphical Models for Vision-Language Compositional Understanding ICLR 2025 MissRAG: Addressing the Missing Modality Challenge in Multimodal Large Language Models ICCV 2025 Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation ICCV 2025 What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models ICCV 2025 Hyperbolic Safety-Aware Vision-Language Models CVPR 2025 Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering CVPR 2025 Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models ECCV 2024 The Revolution of Multimodal Large Language Models: A Survey ACL 2024 Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation CVPR 2024 Personalized Instance-based Navigation Toward User-Specific Objects in Realistic Environments NIPS 2024 Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities ECCV 2024 BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues ECCV 2024 FOSSIL: Free Open-Vocabulary Semantic Segmentation Through Synthetic References Retrieval WACV 2024 What's Outside the Intersection? Fine-Grained Error Analysis for Semantic Segmentation Beyond IoU WACV 2024 Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation CVPR 2023 With a Little Help from Your Own Past: Prototypical Memory Networks for Image Captioning ICCV 2023 Meshed-Memory Transformer for Image Captioning CVPR 2020 Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions CVPR 2019 Art2Real: Unfolding the Reality of Artworks via Semantically-Aware Image-To-Image Translation CVPR 2019 LAMV: Learning to Align and Match Videos With Kernelized Temporal Layers CVPR 2018 Hierarchical Boundary-Aware Neural Encoder for Video Captioning CVPR 2017