multimodal learning

4622 papers

Explore in graph

Also known as

VLM VLLM MM VLA MLLMS MLM MML MULLM LMM MLLM MMT

Co-occurring keywords

large language model (12755) vision-language model (2235) visual question answering (1000) video understanding (1647) multi-modal learning (1276) contrastive learning (3979) representation learning (6174) transfer learning (5442) zero-shot learning (3637) vision language model (752)

Papers

WISE: Weak-Supervision-Guided Step-by-Step Explanations for Multimodal LLMs in Image Classification EMNLP 2025

MIRROR: Multimodal Cognitive Reframing Therapy for Rolling with Resistance EMNLP 2025

LVLMs are Bad at Overhearing Human Referential Communication EMNLP 2025

MemeArena: Automating Context-Aware Unbiased Evaluation of Harmfulness Understanding for Multimodal Large Language Models EMNLP 2025

BANMIME : Misogyny Detection with Metaphor Explanation on Bangla Memes EMNLP 2025

Retrieval over Classification: Integrating Relation Semantics for Multimodal Relation Extraction EMNLP 2025

PunMemeCN: A Benchmark to Explore Vision-Language Models’ Understanding of Chinese Pun Memes EMNLP 2025

How Do Large Vision-Language Models See Text in Image? Unveiling the Distinctive Role of OCR Heads EMNLP 2025

MultiDocFusion : Hierarchical and Multimodal Chunking Pipeline for Enhanced RAG on Long Industrial Documents EMNLP 2025

MMAG: Multimodal Learning for Mucus Anomaly Grading in Nasal Endoscopy via Semantic Attribute Prompting EMNLP 2025

Unveiling the Response of Large Vision-Language Models to Visually Absent Tokens EMNLP 2025

RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction EMNLP 2025

Multimodal Neural Machine Translation: A Survey of the State of the Art EMNLP 2025

LLM-Guided Semantic Relational Reasoning for Multimodal Intent Recognition EMNLP 2025

Seeing Culture: A Benchmark for Visual Reasoning and Grounding EMNLP 2025

Fooling the LVLM Judges: Visual Biases in LVLM-Based Evaluation EMNLP 2025

Walk and Read Less: Improving the Efficiency of Vision-and-Language Navigation via Tuning-Free Multimodal Token Pruning EMNLP 2025

Memory-QA: Answering Recall Questions Based on Multimodal Memories EMNLP 2025

Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation EMNLP 2025

Beyond Text: Unveiling Privacy Vulnerabilities in Multi-modal Retrieval-Augmented Generation EMNLP 2025

Contra4: Evaluating Contrastive Cross-Modal Reasoning in Audio, Video, Image, and 3D EMNLP 2025

MAviS: A Multimodal Conversational Assistant For Avian Species EMNLP 2025

VoiceBBQ: Investigating Effect of Content and Acoustics in Social Bias of Spoken Language Model EMNLP 2025

Causal Representation Learning from Multimodal Clinical Records under Non-Random Modality Missingness EMNLP 2025

What are Foundation Models Cooking in the Post-Soviet World? EMNLP 2025