Lorenzo Baraldi
25 papers · 2017–2026 · 7 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+10 more ↓ Show less ↑
π Interdisciplinary Bridge π Renaissance Researcher (6) π Academic Marathon (9) π Conference Polyglot (7) πΊοΈ Taxonomy Completionist (50)
π
Academic Marathon
(9)
π
Renaissance Researcher
(6)
π
Cross-Pollinator
(15)
π¬
Deep Specialist
(12)
π€
Dynamic Duo
(24)
π§¬
Topic Evolution
π
Century Club
(25)
ποΈ
Keyword Collector
(112)
β‘
Prolific Year
(8)
β
The Questioner
(2)
Conferences
CVPR (10)
WACV (5)
ICCV (4)
ECCV (3)
ACL (1)
ICLR (1)
NIPS (1)
Top co-authors
Keywords
multimodal learning
(5)
image captioning
(4)
multimodal large language model
(4)
vision-language model
(4)
semantic segmentation
(4)
self-supervised learning
(3)
vision language model
(3)
video captioning
(2)
missing modality
(2)
large language model
(2)
open-vocabulary segmentation
(2)
visual grounding
(2)
diffusion model
(2)
vision transformer
(1)
visual question answering
(1)
object recognition
(1)
metric learning
(1)
sequence modeling
(1)
feature extraction
(1)
domain adaptation
(1)
Papers
FG-TRACER: Tracing Information Flow in Multimodal Large Language Models in Free-Form Generation
WACV 2026
Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval
CVPR 2025
Semantically Conditioned Prompts for Visual Recognition under Missing Modality Scenarios
WACV 2025
Perceive Query & Reason: Enhancing Video QA with Question-Guided Temporal Queries
WACV 2025
Causal Graphical Models for Vision-Language Compositional Understanding
ICLR 2025
MissRAG: Addressing the Missing Modality Challenge in Multimodal Large Language Models
ICCV 2025
Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation
ICCV 2025
What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models
ICCV 2025
Hyperbolic Safety-Aware Vision-Language Models
CVPR 2025
Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering
CVPR 2025
Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models
ECCV 2024
The Revolution of Multimodal Large Language Models: A Survey
ACL 2024
Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation
CVPR 2024
Personalized Instance-based Navigation Toward User-Specific Objects in Realistic Environments
NIPS 2024
Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities
ECCV 2024
BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues
ECCV 2024
FOSSIL: Free Open-Vocabulary Semantic Segmentation Through Synthetic References Retrieval
WACV 2024
What's Outside the Intersection? Fine-Grained Error Analysis for Semantic Segmentation Beyond IoU
WACV 2024
Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation
CVPR 2023
With a Little Help from Your Own Past: Prototypical Memory Networks for Image Captioning
ICCV 2023
Meshed-Memory Transformer for Image Captioning
CVPR 2020
Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions
CVPR 2019
Art2Real: Unfolding the Reality of Artworks via Semantically-Aware Image-To-Image Translation
CVPR 2019
LAMV: Learning to Align and Match Videos With Kernelized Temporal Layers
CVPR 2018
Hierarchical Boundary-Aware Neural Encoder for Video Captioning
CVPR 2017