Co-occurring keywords
Papers
Bridging Semantic and Modality Gaps in Zero-Shot Captioning via Retrieval from Synthetic Data
EMNLP 2025
Enhancing Large Language Models for Scientific Multimodal Summarization with Multimodal Output
COLING 2025
Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives
IJCAI 2025
MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects
CVPR 2025
MICE: Mixture of Image Captioning Experts Augmented e-Commerce Product Attribute Value Extraction
ACL 2025
End-to-End Multi-Modal Diffusion Mamba
ICCV 2025