Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
Improving Hierarchical Text Clustering with LLM-guided Multi-view Cluster Representation
EMNLP 2024
Retrieval-enriched zero-shot image classification in low-resource domains
EMNLP 2024
Investigating Multilingual Instruction-Tuning: Do Polyglot Models Demand for Multilingual Instructions?
EMNLP 2024
VIEWS: Entity-Aware News Video Captioning
EMNLP 2024
VHASR: A Multimodal Speech Recognition System With Vision Hotwords
EMNLP 2024
Kiss up, Kick down: Exploring Behavioral Changes in Multi-modal Large Language Models with Assigned Visual Personas
EMNLP 2024
CARER - ClinicAl Reasoning-Enhanced Representation for Temporal Health Risk Prediction
EMNLP 2024
MMoE: Enhancing Multimodal Models with Mixtures of Multimodal Interaction Experts
EMNLP 2024
MIND: Multimodal Shopping Intention Distillation from Large Vision-language Models for E-commerce Purchase Understanding
EMNLP 2024
Bridging Modalities: Enhancing Cross-Modality Hate Speech Detection with Few-Shot In-Context Learning
EMNLP 2024
Divide and Conquer Radiology Report Generation via Observation Level Fine-grained Pretraining and Prompt Tuning
EMNLP 2024
Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale
EMNLP 2024
ConVQG: Contrastive Visual Question Generation with Multimodal Guidance
AAAI 2024
II-MMR: Identifying and Improving Multi-modal Multi-hop Reasoning in Visual Question Answering
ACL 2024
Enhancing Adverse Drug Event Detection with Multimodal Dataset: Corpus Creation and Model Development
ACL 2024
PPTSER: A Plug-and-Play Tag-guided Method for Few-shot Semantic Entity Recognition on Visually-rich Documents
ACL 2024
MODDP: A Multi-modal Open-domain Chinese Dataset for Dialogue Discourse Parsing
ACL 2024
CLASP: Cross-modal Alignment Using Pre-trained Unimodal Models
ACL 2024
An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models
ACL 2024
ChartInstruct: Instruction Tuning for Chart Comprehension and Reasoning
ACL 2024
Multi-modal Stance Detection: New Datasets and Model
ACL 2024
Enhanced BioT5+ for Molecule-Text Translation: A Three-Stage Approach with Data Distillation, Diverse Training, and Voting Ensemble
ACL 2024
Data Roaming and Quality Assessment for Composed Image Retrieval
AAAI 2024
SciMind: A Multimodal Mixture-of-Experts Model for Advancing Pharmaceutical Sciences
ACL 2024
TiMix: Text-Aware Image Mixing for Effective Vision-Language Pre-training
AAAI 2024
<
1
…
43
44
45
…
128
>