← Learning Types

Deep Learning › Learning Types ›

Multi-Modal Learning

3194 directly classified papers

Papers per year

Papers

Beyond Single Frames: Can LMMs Comprehend Implicit Narratives in Comic Strip? EMNLP 2025

VQA-Augmented Machine Translation with Cross-Modal Contrastive Learning EMNLP 2025

DesignCLIP: Multimodal Learning with CLIP for Design Patent Understanding EMNLP 2025

Text or Pixels? Evaluating Efficiency and Understanding of LLMs with Visual Text Inputs EMNLP 2025

Bridging Semantic and Modality Gaps in Zero-Shot Captioning via Retrieval from Synthetic Data EMNLP 2025

InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning EMNLP 2025

Zero-Shot Defense Against Toxic Images via Inherent Multimodal Alignment in LVLMs EMNLP 2025

BcQLM: Efficient Vision-Language Understanding with Distilled Q-Gated Cross-Modal Fusion EMNLP 2025

HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals EMNLP 2025

Captioning for Text-Video Retrieval via Dual-Group Direct Preference Optimization EMNLP 2025

VIVA+: Human-Centered Situational Decision-Making EMNLP 2025

MAFMO: Multi-modal Adaptive Fusion with Meta-template Optimization for Vision-Language Models EMNLP 2025

Differentiated Vision: Unveiling Entity-Specific Visual Modality Requirements for Multimodal Knowledge Graph EMNLP 2025

From Generation to Detection: A Multimodal Multi-Task Dataset for Benchmarking Health Misinformation EMNLP 2025

Beyond Content: How Grammatical Gender Shapes Visual Representation in Text-to-Image Models EMNLP 2025

ImageEval 2025: The First Arabic Image Captioning Shared Task EMNLP 2025

Template-Based Text-to-Image Alignment for Language Accessibility A Study on Visualizing Text Simplifications EMNLP 2025

GIIFT: Graph-guided Inductive Image-free Multimodal Machine Translation EMNLP 2025

Factors Affecting Translation Quality in In-context Learning for Multilingual Medical Domain EMNLP 2025

SONAR-SLT: Multilingual Sign Language Translation via Language-Agnostic Sentence Embedding Supervision EMNLP 2025

Preserve or Modify? Context-Aware Evaluation for Balancing Preservation and Modification in Text-Guided Image Editing CVPR 2025

MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models CVPR 2025

CTRL-O: Language-Controllable Object-Centric Visual Representation Learning CVPR 2025

Can Vision Language Models Understand Mimed Actions? ACL 2025

MammAlps: A Multi-view Video Behavior Monitoring Dataset of Wild Mammals in the Swiss Alps CVPR 2025