Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection
CVPR 2022
Audio-Adaptive Activity Recognition Across Video Domains
CVPR 2022
Cross-Modal Perceptionist: Can Face Geometry Be Gleaned From Voices?
CVPR 2022
More Than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech
CVPR 2022
Learning Based Multi-Modality Image and Video Compression
CVPR 2022
PluGeN: Multi-Label Conditional Generation from Pre-trained Models
AAAI 2022
Omnivore: A Single Model for Many Visual Modalities
CVPR 2022
Masked Label Prediction: Unified Message Passing Model for Semi-Supervised Classification
IJCAI 2021
Two-Stream Convolution Augmented Transformer for Human Activity Recognition
AAAI 2021
Embracing Domain Differences in Fake News: Cross-domain Fake News Detection using Multi-modal Data
AAAI 2021
Visual Relation Detection using Hybrid Analogical Learning
AAAI 2021
Quantum Cognitively Motivated Decision Fusion for Video Sentiment Analysis
AAAI 2021
Commonsense Knowledge Aware Concept Selection For Diverse and Informative Visual Storytelling
AAAI 2021
Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding
AAAI 2021
Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation
AAAI 2021
RTS3D: Real-time Stereo 3D Detection from 4D Feature-Consistency Embedding Space for Autonomous Driving
AAAI 2021
SMIL: Multimodal Learning with Severely Missing Modality
AAAI 2021
CHEF: Cross-modal Hierarchical Embeddings for Food Domain Retrieval
AAAI 2021
Dual Adversarial Graph Neural Networks for Multi-label Cross-modal Retrieval
AAAI 2021
Semantic Grouping Network for Video Captioning
AAAI 2021
Audio-Visual Localization by Synthetic Acoustic Image Generation
AAAI 2021
Efficient Object-Level Visual Context Modeling for Multimodal Machine Translation: Masking Irrelevant Objects Helps Grounding
AAAI 2021
Confidence-aware Non-repetitive Multimodal Transformers for TextCaps
AAAI 2021
MVFNet: Multi-View Fusion Network for Efficient Video Recognition
AAAI 2021
Binaural Audio-Visual Localization
AAAI 2021
<
1
…
95
96
97
…
128
>