Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Machine Learning
›
Learning Types
›
Multi-Modal Learning
1213 directly classified papers
Papers per year
2007: 2
2008: 1
2009: 1
2011: 2
2012: 5
2013: 5
2014: 1
2015: 5
2016: 8
2017: 21
2018: 42
2019: 42
2020: 69
2021: 72
2022: 149
2023: 143
2024: 258
2025: 370
2026: 17
Papers
ELBA: Learning by Asking for Embodied Visual Navigation and Task Completion
WACV 2025
DMPT: Decoupled Modality-Aware Prompt Tuning for Multi-Modal Object Re-Identification
WACV 2025
ActionDiffusion: An Action-Aware Diffusion Model for Procedure Planning in Instructional Videos
WACV 2025
To Ask or Not to Ask? Detecting Absence of Information in Vision and Language Navigation
WACV 2025
CM3T: Framework for Efficient Multimodal Learning for Inhomogeneous Interaction Datasets
WACV 2025
When and How to Augment Your Input: Question Routing Helps Balance the Accuracy and Efficiency of Large Language Models
NAACL 2025
MIRe: Enhancing Multimodal Queries Representation via Fusion-Free Modality Interaction for Multimodal Retrieval
ACL 2025
Beyond the Mode: Sequence-Level Distillation of Multilingual Translation Models for Low-Resource Language Pairs
NAACL 2025
CityNav: A Large-Scale Dataset for Real-World Aerial Navigation
ICCV 2025
From Text to Multi-Modal: Advancing Low-Resource-Language Translation through Synthetic Data Generation and Cross-Modal Alignments
NAACL 2025
StuD: A Multimodal Approach for Stuttering Detection with RAG and Fusion Strategies
IJCNLP 2025
Mind the Gap: Aligning Vision Foundation Models to Image Feature Matching
ICCV 2025
RusCode: Russian Cultural Code Benchmark for Text-to-Image Generation
NAACL 2025
CLaMP 3: Universal Music Information Retrieval Across Unaligned Modalities and Unseen Languages
ACL 2025
MultiCoPIE: A Multilingual Corpus of Potentially Idiomatic Expressions for Cross-lingual PIE Disambiguation
NAACL 2025
HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding?
ICCV 2025
Dynamic Group Detection using VLM-augmented Temporal Groupness Graph
ICCV 2025
Dynamic Interactive Bimodal Hypergraph Networks for Emotion Recognition in Conversations
AAAI 2025
Capturing the Unseen: Vision-Free Facial Motion Capture Using Inertial Measurement Units
AAAI 2025
MSR: A Multifaceted Self-Retrieval Framework for Microscopic Cascade Prediction
AAAI 2025
Can Large Language Models Classify and Generate Antimicrobial Resistance Genes?
ACL 2025
Cross-Modal Distillation for 2D/3D Multi-Object Discovery from 2D Motion
CVPR 2025
DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving
CVPR 2025
PAI at SemEval-2025 Task 11: A Large Language Model Ensemble Strategy for Text-Based Emotion Detection
ACL 2025
Incorporating Dense Knowledge Alignment into Unified Multimodal Representation Models
CVPR 2025
<
1
…
12
13
14
…
49
>