Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation
AAAI 2024
Exploiting Polarized Material Cues for Robust Car Detection
AAAI 2024
Mitigating Idiom Inconsistency: A Multi-Semantic Contrastive Learning Method for Chinese Idiom Reading Comprehension
AAAI 2024
SparseGNV: Generating Novel Views of Indoor Scenes with Sparse RGB-D Images
AAAI 2024
HybridGait: A Benchmark for Spatial-Temporal Cloth-Changing Gait Recognition with Hybrid Explorations
AAAI 2024
Bi-directional Adapter for Multimodal Tracking
AAAI 2024
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE
AAAI 2024
VIXEN: Visual Text Comparison Network for Image Difference Captioning
AAAI 2024
FashionERN: Enhance-and-Refine Network for Composed Fashion Image Retrieval
AAAI 2024
DIUSum: Dynamic Image Utilization for Multimodal Summarization
AAAI 2024
DocFormerv2: Local Features for Document Understanding
AAAI 2024
Local-Global Multi-Modal Distillation for Weakly-Supervised Temporal Video Grounding
AAAI 2024
Beyond the Label Itself: Latent Labels Enhance Semi-supervised Point Cloud Panoptic Segmentation
AAAI 2024
Chitranuvad: Adapting Multi-lingual LLMs for Multimodal Translation
EMNLP 2024
DCU ADAPT at WMT24: English to Low-resource Multi-Modal Translation Task
EMNLP 2024
Multilingual Synopses of Movie Narratives: A Dataset for Vision-Language Story Understanding
EMNLP 2024
Video Discourse Parsing and Its Application to Multimodal Summarization: A Dataset and Baseline Approaches
EMNLP 2024
MMAR: Multilingual and Multimodal Anaphora Resolution in Instructional Videos
EMNLP 2024
Diversify, Rationalize, and Combine: Ensembling Multiple QA Strategies for Zero-shot Knowledge-based VQA
EMNLP 2024
Improving Hierarchical Text Clustering with LLM-guided Multi-view Cluster Representation
EMNLP 2024
Retrieval-enriched zero-shot image classification in low-resource domains
EMNLP 2024
Investigating Multilingual Instruction-Tuning: Do Polyglot Models Demand for Multilingual Instructions?
EMNLP 2024
VIEWS: Entity-Aware News Video Captioning
EMNLP 2024
VHASR: A Multimodal Speech Recognition System With Vision Hotwords
EMNLP 2024
Kiss up, Kick down: Exploring Behavioral Changes in Multi-modal Large Language Models with Assigned Visual Personas
EMNLP 2024
<
1
…
41
42
43
…
128
>