Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Multi-Modal Learning
1457 directly classified papers
Papers per year
2011: 1
2013: 4
2014: 3
2015: 3
2016: 9
2017: 11
2018: 27
2019: 61
2020: 109
2021: 87
2022: 153
2023: 213
2024: 391
2025: 384
2026: 1
Papers
Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset
NIPS 2024
Detecting and Grounding Important Characters in Visual Stories
AAAI 2023
Learning To Segment Every Referring Object Point by Point
CVPR 2023
GeoVLN: Learning Geometry-Enhanced Visual Representation With Slot Attention for Vision-and-Language Navigation
CVPR 2023
Combining Implicit-Explicit View Correlation for Light Field Semantic Segmentation
CVPR 2023
Improving Cross-Modal Retrieval With Set of Diverse Embeddings
CVPR 2023
Learning 3D Scene Priors With 2D Supervision
CVPR 2023
AVFormer: Injecting Vision Into Frozen Speech Models for Zero-Shot AV-ASR
CVPR 2023
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline
CVPR 2023
Logical Implications for Visual Question Answering Consistency
CVPR 2023
KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation
CVPR 2023
Language-Guided Music Recommendation for Video via Prompt Analogies
CVPR 2023
Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention
CVPR 2023
EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding
CVPR 2023
MAP: Multimodal Uncertainty-Aware Vision-Language Pre-Training Model
CVPR 2023
Advancing Visual Grounding With Scene Knowledge: Benchmark and Method
CVPR 2023
Noisy Correspondence Learning With Meta Similarity Correction
CVPR 2023
WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation
CVPR 2023
Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding
CVPR 2023
Referring Multi-Object Tracking
CVPR 2023
Learning To Fuse Monocular and Multi-View Cues for Multi-Frame Depth Estimation in Dynamic Scenes
CVPR 2023
Referring Image Matting
CVPR 2023
RefTeacher: A Strong Baseline for Semi-Supervised Referring Expression Comprehension
CVPR 2023
Improving Vision-and-Language Navigation by Generating Future-View Image Semantics
CVPR 2023
UnLoc: A Unified Framework for Video Localization Tasks
ICCV 2023
<
1
…
31
32
33
…
59
>