Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Multi-Modal Learning
1457 directly classified papers
Papers per year
2011: 1
2013: 4
2014: 3
2015: 3
2016: 9
2017: 11
2018: 27
2019: 61
2020: 109
2021: 87
2022: 153
2023: 213
2024: 391
2025: 384
2026: 1
Papers
Active Exploration of Multimodal Complementarity for Few-Shot Action Recognition
CVPR 2023
MetaCLUE: Towards Comprehensive Visual Metaphors Research
CVPR 2023
Gloss Attention for Gloss-Free Sign Language Translation
CVPR 2023
Video-Text As Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
CVPR 2023
Procedure-Aware Pretraining for Instructional Video Understanding
CVPR 2023
Adaptive Zone-Aware Hierarchical Planner for Vision-Language Navigation
CVPR 2023
EXIF As Language: Learning Cross-Modal Associations Between Images and Camera Metadata
CVPR 2023
Zero-Shot Everything Sketch-Based Image Retrieval, and in Explainable Style
CVPR 2023
Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object Detection
CVPR 2023
Ham2Pose: Animating Sign Language Notation Into Pose Sequences
CVPR 2023
METransformer: Radiology Report Generation by Transformer With Multiple Learnable Expert Tokens
CVPR 2023
Exploring the Effect of Primitives for Compositional Generalization in Vision-and-Language
CVPR 2023
Focus on Details: Online Multi-Object Tracking With Diverse Fine-Grained Representation
CVPR 2023
Critical Learning Periods for Multisensory Integration in Deep Networks
CVPR 2023
When does CLIP generalize better than unimodal models? When judging human-centric concepts
ACL 2022
Video Language Co-Attention with Multimodal Fast-Learning Feature Fusion for VideoQA
ACL 2022
Audio-Visual Generalised Zero-Shot Learning With Cross-Modal Attention and Language
CVPR 2022
Finding Fallen Objects via Asynchronous Audio-Visual Integration
CVPR 2022
Retrieval-Based Spatially Adaptive Normalization for Semantic Image Synthesis
CVPR 2022
Cross-Modal Representation Learning for Zero-Shot Action Recognition
CVPR 2022
Cross-Modal Perceptionist: Can Face Geometry Be Gleaned From Voices?
CVPR 2022
A Brand New Dance Partner: Music-Conditioned Pluralistic Dancing Controlled by Multiple Dance Genres
CVPR 2022
Learning Based Multi-Modality Image and Video Compression
CVPR 2022
Open-Domain, Content-Based, Multi-Modal Fact-Checking of Out-of-Context Images via Online Resources
CVPR 2022
Stand-Alone Inter-Frame Attention in Video Models
CVPR 2022
<
1
…
39
40
41
…
59
>