Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
HintsOfTruth: A Multimodal Checkworthiness Detection Dataset with Real and Synthetic Claims
ACL 2025
Narrating the Video: Boosting Text-Video Retrieval via Comprehensive Utilization of Frame-Level Captions
CVPR 2025
NeKo: Cross-Modality Post-Recognition Error Correction with Tasks-Guided Mixture-of-Experts Language Model
ACL 2025
MotiR: Motivation-aware Retrieval for Long-Tail Recommendation
ACL 2025
Filter-And-Refine: A MLLM Based Cascade System for Industrial-Scale Video Content Moderation
ACL 2025
Distilling Multi-modal Large Language Models for Autonomous Driving
CVPR 2025
VCD: A Dataset for Visual Commonsense Discovery in Images
ACL 2025
Multimodal Causal Reasoning Benchmark: Challenging Multimodal Large Language Models to Discern Causal Links Across Modalities
ACL 2025
DALR: Dual-level Alignment Learning for Multimodal Sentence Representation Learning
ACL 2025
Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy
CVPR 2025
CTPD: Cross-Modal Temporal Pattern Discovery for Enhanced Multimodal Electronic Health Records Analysis
ACL 2025
SketchAgent: Generating Structured Diagrams from Hand-Drawn Sketches
IJCAI 2025
P²Net: Parallel Pointer-based Network for Key Information Extraction with Complex Layouts
ACL 2025
Object-Shot Enhanced Grounding Network for Egocentric Video
CVPR 2025
Multimodal Prior Learning with Double Constraint Alignment for Snapshot Spectral Compressive Imaging
IJCAI 2025
A Cross-Modal Densely Guided Knowledge Distillation Based on Modality Rebalancing Strategy for Enhanced Unimodal Emotion Recognition
IJCAI 2025
SONAR-SLT: Multilingual Sign Language Translation via Language-Agnostic Sentence Embedding Supervision
EMNLP 2025
DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation
CVPR 2025
Screening, Rectifying, and Re-Screening: A Unified Framework for Tuning Vision-Language Models with Noisy Labels
IJCAI 2025
BMIP: Bi-directional Modality Interaction Prompt Learning for VLM
IJCAI 2025
Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data
CVPR 2025
Question-Aware Gaussian Experts for Audio-Visual Question Answering
CVPR 2025
Consistency-Aware Padding for Incomplete Multi-Modal Alignment Clustering Based on Self-Repellent Greedy Anchor Search
IJCAI 2025
Connecting Giants: Synergistic Knowledge Transfer of Large Multimodal Models for Few-Shot Learning
IJCAI 2025
MMGDreamer: Mixed-Modality Graph for Geometry-Controllable 3D Indoor Scene Generation
AAAI 2025
<
1
…
17
18
19
…
128
>