Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
Beyond Single Frames: Can LMMs Comprehend Implicit Narratives in Comic Strip?
EMNLP 2025
VQA-Augmented Machine Translation with Cross-Modal Contrastive Learning
EMNLP 2025
DesignCLIP: Multimodal Learning with CLIP for Design Patent Understanding
EMNLP 2025
Text or Pixels? Evaluating Efficiency and Understanding of LLMs with Visual Text Inputs
EMNLP 2025
Bridging Semantic and Modality Gaps in Zero-Shot Captioning via Retrieval from Synthetic Data
EMNLP 2025
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning
EMNLP 2025
Zero-Shot Defense Against Toxic Images via Inherent Multimodal Alignment in LVLMs
EMNLP 2025
BcQLM: Efficient Vision-Language Understanding with Distilled Q-Gated Cross-Modal Fusion
EMNLP 2025
HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals
EMNLP 2025
Captioning for Text-Video Retrieval via Dual-Group Direct Preference Optimization
EMNLP 2025
VIVA+: Human-Centered Situational Decision-Making
EMNLP 2025
MAFMO: Multi-modal Adaptive Fusion with Meta-template Optimization for Vision-Language Models
EMNLP 2025
Differentiated Vision: Unveiling Entity-Specific Visual Modality Requirements for Multimodal Knowledge Graph
EMNLP 2025
From Generation to Detection: A Multimodal Multi-Task Dataset for Benchmarking Health Misinformation
EMNLP 2025
Beyond Content: How Grammatical Gender Shapes Visual Representation in Text-to-Image Models
EMNLP 2025
ImageEval 2025: The First Arabic Image Captioning Shared Task
EMNLP 2025
Template-Based Text-to-Image Alignment for Language Accessibility A Study on Visualizing Text Simplifications
EMNLP 2025
GIIFT: Graph-guided Inductive Image-free Multimodal Machine Translation
EMNLP 2025
Factors Affecting Translation Quality in In-context Learning for Multilingual Medical Domain
EMNLP 2025
SONAR-SLT: Multilingual Sign Language Translation via Language-Agnostic Sentence Embedding Supervision
EMNLP 2025
Preserve or Modify? Context-Aware Evaluation for Balancing Preservation and Modification in Text-Guided Image Editing
CVPR 2025
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models
CVPR 2025
CTRL-O: Language-Controllable Object-Centric Visual Representation Learning
CVPR 2025
Can Vision Language Models Understand Mimed Actions?
ACL 2025
MammAlps: A Multi-view Video Behavior Monitoring Dataset of Wild Mammals in the Swiss Alps
CVPR 2025
<
1
…
30
31
32
…
128
>