Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
R2-MultiOmnia: Leading Multilingual Multimodal Reasoning via Self-Training
ACL 2025
Deduce and Select Evidences with Language Models for Training-Free Video Goal Inference
WACV 2025
HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models
ACL 2025
Locate-and-Focus: Enhancing Terminology Translation in Speech Language Models
ACL 2025
A Strategic Coordination Framework of Small LMs Matches Large LMs in Data Synthesis
ACL 2025
VC4VG: Optimizing Video Captions for Text-to-Video Generation
EMNLP 2025
MIO: A Foundation Model on Multimodal Tokens
EMNLP 2025
TRUST-VL: An Explainable News Assistant for General Multimodal Misinformation Detection
EMNLP 2025
Audio-centric Video Understanding Benchmark without Text Shortcut
EMNLP 2025
R-Bind: Unified Enhancement of Attribute and Relation Binding in Text-to-Image Diffusion Models
EMNLP 2025
SpecVLM: Enhancing Speculative Decoding of Video LLMs via Verifier-Guided Token Pruning
EMNLP 2025
LEO-MINI: An Efficient Multimodal Large Language Model using Conditional Token Reduction and Mixture of Multi-Modal Experts
EMNLP 2025
VELA: An LLM-Hybrid-as-a-Judge Approach for Evaluating Long Image Captions
EMNLP 2025
HVGuard: Utilizing Multimodal Large Language Models for Hateful Video Detection
EMNLP 2025
LATTE: Learning to Think with Vision Specialists
EMNLP 2025
CoMMIT: Coordinated Multimodal Instruction Tuning
EMNLP 2025
AnyMAC: Cascading Flexible Multi-Agent Collaboration via Next-Agent Prediction
EMNLP 2025
CEMTM: Contextual Embedding-based Multimodal Topic Modeling
EMNLP 2025
Mitigating Hallucinations in Vision-Language Models through Image-Guided Head Suppression
EMNLP 2025
ViDove: A Translation Agent System with Multimodal Context and Memory-Augmented Reasoning
EMNLP 2025
PresentAgent: Multimodal Agent for Presentation Video Generation
EMNLP 2025
MathBuddy: A Multimodal System for Affective Math Tutoring
EMNLP 2025
RCI: A Score for Evaluating Global and Local Reasoning in Multimodal Benchmarks
EMNLP 2025
PCRI: Measuring Context Robustness in Multimodal Models for Enterprise Applications
EMNLP 2025
MoEdit: On Learning Quantity Perception for Multi-object Image Editing
CVPR 2025
<
1
…
29
30
31
…
128
>