multimodal learning
4622 papers
Also known as
VLM
VLLM
MM
VLA
MLLMS
MLM
MML
MULLM
LMM
MLLM
MMT
Co-occurring keywords
Papers
How Do Large Vision-Language Models See Text in Image? Unveiling the Distinctive Role of OCR Heads
EMNLP 2025
PunMemeCN: A Benchmark to Explore Vision-Language Models’ Understanding of Chinese Pun Memes
EMNLP 2025
Retrieval over Classification: Integrating Relation Semantics for Multimodal Relation Extraction
EMNLP 2025
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning
EMNLP 2025
Judge and Improve: Towards a Better Reasoning of Knowledge Graphs with Large Language Models
EMNLP 2025
ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents
EMNLP 2025
Uni-NaVid: A Video-based Vision-Language-Action Model for Unifying Embodied Navigation Tasks
RSS 2025