Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
Cross-Constrained Progressive Inference for 3D Hand Pose Estimation with Dynamic Observer-Decision-Adjuster Networks
AAAI 2024
SC-NeuS: Consistent Neural Surface Reconstruction from Sparse and Noisy Views
AAAI 2024
Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-Modal Structured Representations
AAAI 2024
TiMix: Text-Aware Image Mixing for Effective Vision-Language Pre-training
AAAI 2024
Improving Audio-Visual Segmentation with Bidirectional Generation
AAAI 2024
Prompting Multi-Modal Image Segmentation with Semantic Grouping
AAAI 2024
Chitranuvad: Adapting Multi-lingual LLMs for Multimodal Translation
EMNLP 2024
DCU ADAPT at WMT24: English to Low-resource Multi-Modal Translation Task
EMNLP 2024
Improving Panoptic Narrative Grounding by Harnessing Semantic Relationships and Visual Confirmation
AAAI 2024
Multilingual Synopses of Movie Narratives: A Dataset for Vision-Language Story Understanding
EMNLP 2024
Video Discourse Parsing and Its Application to Multimodal Summarization: A Dataset and Baseline Approaches
EMNLP 2024
COMMA: Co-articulated Multi-Modal Learning
AAAI 2024
MMAR: Multilingual and Multimodal Anaphora Resolution in Instructional Videos
EMNLP 2024
Diversify, Rationalize, and Combine: Ensembling Multiple QA Strategies for Zero-shot Knowledge-based VQA
EMNLP 2024
LAMM: Label Alignment for Multi-Modal Prompt Learning
AAAI 2024
Improving Hierarchical Text Clustering with LLM-guided Multi-view Cluster Representation
EMNLP 2024
Retrieval-enriched zero-shot image classification in low-resource domains
EMNLP 2024
SoftCLIP: Softer Cross-Modal Alignment Makes CLIP Stronger
AAAI 2024
Investigating Multilingual Instruction-Tuning: Do Polyglot Models Demand for Multilingual Instructions?
EMNLP 2024
VIEWS: Entity-Aware News Video Captioning
EMNLP 2024
Composite Sketch+Text Queries for Retrieving Objects with Elusive Names and Complex Interactions
AAAI 2024
VHASR: A Multimodal Speech Recognition System With Vision Hotwords
EMNLP 2024
Kiss up, Kick down: Exploring Behavioral Changes in Multi-modal Large Language Models with Assigned Visual Personas
EMNLP 2024
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
AAAI 2024
CARER - ClinicAl Reasoning-Enhanced Representation for Temporal Health Risk Prediction
EMNLP 2024
<
1
…
46
47
48
…
128
>