Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
Template-Based Text-to-Image Alignment for Language Accessibility A Study on Visualizing Text Simplifications
EMNLP 2025
FoMo: Multi-Modal, Multi-Scale and Multi-Task Remote Sensing Foundation Models for Forest Monitoring
AAAI 2025
TaxaBind: A Unified Embedding Space for Ecological Applications
WACV 2025
VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents
CVPR 2025
Learning to Highlight Audio by Watching Movies
CVPR 2025
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding
CVPR 2025
Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model
ACL 2025
Progressive Multimodal Reasoning via Active Retrieval
ACL 2025
Captioning for Text-Video Retrieval via Dual-Group Direct Preference Optimization
EMNLP 2025
ATRI: Mitigating Multilingual Audio Text Retrieval Inconsistencies by Reducing Data Distribution Errors
ACL 2025
Latency Robust Cooperative Perception using Asynchronous Feature Fusion
WACV 2025
Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Large Model Enhancement
CVPR 2025
Towards Effective and Efficient Continual Pre-training of Large Language Models
ACL 2025
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding
ACL 2025
Object-aware Sound Source Localization via Audio-Visual Scene Understanding
CVPR 2025
ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation
ACL 2025
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
CVPR 2025
VLSBench: Unveiling Visual Leakage in Multimodal Safety
ACL 2025
SyncViolinist: Music-Oriented Violin Motion Generation Based on Bowing and Fingering
WACV 2025
LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models
CVPR 2025
Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception
CVPR 2025
A Strategic Coordination Framework of Small LMs Matches Large LMs in Data Synthesis
ACL 2025
Online Video Understanding: OVBench and VideoChat-Online
CVPR 2025
Cultivating Gaming Sense for Yourself: Making VLMs Gaming Experts
ACL 2025
MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects
CVPR 2025
<
1
…
28
29
30
…
128
>