Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
Two-Stream Network for Sign Language Recognition and Translation
NIPS 2022
Learning Audio-Visual Dynamics Using Scene Graphs for Audio Source Separation
NIPS 2022
Cross-Linked Unified Embedding for cross-modality representation learning
NIPS 2022
CoupAlign: Coupling Word-Pixel with Sentence-Mask Alignments for Referring Image Segmentation
NIPS 2022
Time-Conditioned Dances with Simplicial Complexes: Zigzag Filtration Curve based Supra-Hodge Convolution Networks for Time-series Forecasting
NIPS 2022
Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
NIPS 2022
Touch and Go: Learning from Human-Collected Vision and Touch
NIPS 2022
MACK: Multimodal Aligned Conceptual Knowledge for Unpaired Image-text Matching
NIPS 2022
Egocentric Video-Language Pretraining
NIPS 2022
Fine-Grained Semantically Aligned Vision-Language Pre-Training
NIPS 2022
Uncertainty Estimation for Multi-view Data: The Power of Seeing the Whole Picture
NIPS 2022
Paraphrasing Is All You Need for Novel Object Captioning
NIPS 2022
AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments
NIPS 2022
OmniVL: One Foundation Model for Image-Language and Video-Language Tasks
NIPS 2022
Divert More Attention to Vision-Language Tracking
NIPS 2022
Sparse2Dense: Learning to Densify 3D Features for 3D Object Detection
NIPS 2022
Towards Effective Multi-Modal Interchanges in Zero-Resource Sounding Object Localization
NIPS 2022
Scaling Multimodal Pre-Training via Cross-Modality Gradient Harmonization
NIPS 2022
PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model Pretraining
NIPS 2022
Non-Linguistic Supervision for Contrastive Learning of Sentence Embeddings
NIPS 2022
OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal Regression
NIPS 2022
Multi-modal Grouping Network for Weakly-Supervised Audio-Visual Video Parsing
NIPS 2022
Robustness Analysis of Video-Language Models Against Visual and Language Perturbations
NIPS 2022
Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning
NIPS 2022
Dense Interspecies Face Embedding
NIPS 2022
<
1
…
84
85
86
…
128
>