Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
CCHall: A Novel Benchmark for Joint Cross-Lingual and Cross-Modal Hallucinations Detection in Large Language Models
ACL 2025
Multi-Modality Expansion and Retention for LLMs through Parameter Merging and Decoupling
ACL 2025
HintsOfTruth: A Multimodal Checkworthiness Detection Dataset with Real and Synthetic Claims
ACL 2025
Enabling Chatbots with Eyes and Ears: An Immersive Multimodal Conversation System for Dynamic Interactions
ACL 2025
ConECT Dataset: Overcoming Data Scarcity in Context-Aware E-Commerce MT
ACL 2025
NeKo: Cross-Modality Post-Recognition Error Correction with Tasks-Guided Mixture-of-Experts Language Model
ACL 2025
Filter-And-Refine: A MLLM Based Cascade System for Industrial-Scale Video Content Moderation
ACL 2025
MotiR: Motivation-aware Retrieval for Long-Tail Recommendation
ACL 2025
A Character-Centric Creative Story Generation via Imagination
ACL 2025
Double Entendre: Robust Audio-Based AI-Generated Lyrics Detection via Multi-View Fusion
ACL 2025
Reasoning is All You Need for Video Generalization: A Counterfactual Benchmark with Sub-question Evaluation
ACL 2025
DALR: Dual-level Alignment Learning for Multimodal Sentence Representation Learning
ACL 2025
Multimodal Causal Reasoning Benchmark: Challenging Multimodal Large Language Models to Discern Causal Links Across Modalities
ACL 2025
VCD: A Dataset for Visual Commonsense Discovery in Images
ACL 2025
CTPD: Cross-Modal Temporal Pattern Discovery for Enhanced Multimodal Electronic Health Records Analysis
ACL 2025
From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalities
ACL 2025
P²Net: Parallel Pointer-based Network for Key Information Extraction with Complex Layouts
ACL 2025
A Couch Potato is not a Potato on a Couch: Prompting Strategies, Image Generation, and Compositionality Prediction for Noun Compounds
ACL 2025
VAQUUM: Are Vague Quantifiers Grounded in Visual Data?
ACL 2025
MVL-SIB: A Massively Multilingual Vision-Language Benchmark for Cross-Modal Topical Matching
ACL 2025
VP-MEL: Visual Prompts Guided Multimodal Entity Linking
ACL 2025
MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens
ACL 2025
VideoRAG: Retrieval-Augmented Generation over Video Corpus
ACL 2025
BottleHumor: Self-Informed Humor Explanation using the Information Bottleneck Principle
ACL 2025
EssayJudge: A Multi-Granular Benchmark for Assessing Automated Essay Scoring Capabilities of Multimodal Large Language Models
ACL 2025
<
1
…
24
25
26
…
128
>