Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Multi-Modal Learning
1457 directly classified papers
Papers per year
2011: 1
2013: 4
2014: 3
2015: 3
2016: 9
2017: 11
2018: 27
2019: 61
2020: 109
2021: 87
2022: 153
2023: 213
2024: 391
2025: 384
2026: 1
Papers
Scene Graph as Pivoting: Inference-time Image-free Unsupervised Multimodal Machine Translation with Visual Scene Hallucination
ACL 2023
Unifying Cross-Lingual and Cross-Modal Modeling Towards Weakly Supervised Multilingual Vision-Language Pre-training
ACL 2023
Non-Sequential Graph Script Induction via Multimedia Grounding
ACL 2023
VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic Transitions
ACL 2023
Multilingual Conceptual Coverage in Text-to-Image Models
ACL 2023
BIG-C: a Multimodal Multi-Purpose Dataset for Bemba
ACL 2023
NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation
ACL 2023
Layer-wise Fusion with Modality Independence Modeling for Multi-modal Emotion Recognition
ACL 2023
Learning To Name Classes for Vision and Language Models
CVPR 2023
StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-Based Generator
CVPR 2023
Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception
CVPR 2023
Co-Speech Gesture Synthesis by Reinforcement Learning With Contrastive Pre-Trained Rewards
CVPR 2023
AVFace: Towards Detailed Audio-Visual 4D Face Reconstruction
CVPR 2023
GRES: Generalized Referring Expression Segmentation
CVPR 2023
Freestyle Layout-to-Image Synthesis
CVPR 2023
Frame-Event Alignment and Fusion Network for High Frame Rate Tracking
CVPR 2023
DA-DETR: Domain Adaptive Detection Transformer With Information Fusion
CVPR 2023
Natural Language-Assisted Sign Language Recognition
CVPR 2023
Novel-View Acoustic Synthesis
CVPR 2023
Collaborative Static and Dynamic Vision-Language Streams for Spatio-Temporal Video Grounding
CVPR 2023
Q: How To Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!
CVPR 2023
Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding
CVPR 2023
Language Adaptive Weight Generation for Multi-Task Visual Grounding
CVPR 2023
ImageBind: One Embedding Space To Bind Them All
CVPR 2023
Category Query Learning for Human-Object Interaction Classification
CVPR 2023
<
1
…
38
39
40
…
59
>