Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
Let Images Give You More: Point Cloud Cross-Modal Training for Shape Analysis
NIPS 2022
Micro and Macro Level Graph Modeling for Graph Variational Auto-Encoders
NIPS 2022
FETA: Towards Specializing Foundational Models for Expert Task Applications
NIPS 2022
Multi-Lingual Acquisition on Multimodal Pre-training for Cross-modal Retrieval
NIPS 2022
CLiMB: A Continual Learning Benchmark for Vision-and-Language Tasks
NIPS 2022
mRI: Multi-modal 3D Human Pose Estimation Dataset using mmWave, RGB-D, and Inertial Sensors
NIPS 2022
WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models
NIPS 2022
Multi-Modal Alignment Using Representation Codebook
CVPR 2022
Stereo Depth From Events Cameras: Concentrate and Focus on the Future
CVPR 2022
Boosting Crowd Counting via Multifaceted Attention
CVPR 2022
Audio-Visual Generalised Zero-Shot Learning With Cross-Modal Attention and Language
CVPR 2022
Language As Queries for Referring Video Object Segmentation
CVPR 2022
Mix and Localize: Localizing Sound Sources in Mixtures
CVPR 2022
Expressive Talking Head Generation With Granular Audio-Visual Control
CVPR 2022
Dynamic 3D Gaze From Afar: Deep Gaze Estimation From Temporal Eye-Head-Body Coordination
CVPR 2022
Conditional Prompt Learning for Vision-Language Models
CVPR 2022
Talking Face Generation With Multilingual TTS
CVPR 2022
Open-Domain, Content-Based, Multi-Modal Fact-Checking of Out-of-Context Images via Online Resources
CVPR 2022
EnvEdit: Environment Editing for Vision-and-Language Navigation
CVPR 2022
CamLiFlow: Bidirectional Camera-LiDAR Fusion for Joint Optical Flow and Scene Flow Estimation
CVPR 2022
LaTr: Layout-Aware Transformer for Scene-Text VQA
CVPR 2022
FMCNet: Feature-Level Modality Compensation for Visible-Infrared Person Re-Identification
CVPR 2022
Beyond Fixation: Dynamic Window Visual Transformer
CVPR 2022
Dual-Generator Face Reenactment
CVPR 2022
PoseKernelLifter: Metric Lifting of 3D Human Pose Using Sound
CVPR 2022
<
1
…
85
86
87
…
128
>