Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Computer Vision
›
Core AI
›
Multimodal Learning
1257 directly classified papers
Papers per year
2008: 1
2009: 2
2010: 2
2011: 1
2012: 3
2013: 3
2014: 2
2015: 5
2017: 11
2018: 25
2019: 33
2020: 66
2021: 47
2022: 113
2023: 199
2024: 325
2025: 411
2026: 8
Papers
Ges3ViG : Incorporating Pointing Gestures into Language-Based 3D Visual Grounding for Embodied Reference Understanding
CVPR 2025
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale
CVPR 2025
LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes
CVPR 2025
Semantic and Sequential Alignment for Referring Video Object Segmentation
CVPR 2025
ProAPO: Progressively Automatic Prompt Optimization for Visual Classification
CVPR 2025
LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant
CVPR 2025
CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment
CVPR 2025
Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration
CVPR 2025
Cross-Modal 3D Representation with Multi-View Images and Point Clouds
CVPR 2025
Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding
CVPR 2025
Mamba-Reg: Vision Mamba Also Needs Registers
CVPR 2025
Hearing Hands: Generating Sounds from Physical Interactions in 3D Scenes
CVPR 2025
MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects
CVPR 2025
SyncVP: Joint Diffusion for Synchronous Multi-Modal Video Prediction
CVPR 2025
STPro: Spatial and Temporal Progressive Learning for Weakly Supervised Spatio-Temporal Grounding
CVPR 2025
Adapting Text-to-Image Generation with Feature Difference Instruction for Generic Image Restoration
CVPR 2025
ReCon: Enhancing True Correspondence Discrimination through Relation Consistency for Robust Noisy Correspondence Learning
CVPR 2025
Evaluating Model Perception of Color Illusions in Photorealistic Scenes
CVPR 2025
MINIMA: Modality Invariant Image Matching
CVPR 2025
Look before You Leap: Dual Logical Verification for Knowledge-based Visual Question Generation
COLING 2024
WikiScenes with Descriptions: Aligning Paragraphs and Sentences with Images in Wikipedia Articles
NAACL 2024
Learning 1D Causal Visual Representation with De-focus Attention Networks
NIPS 2024
ArtQuest: Countering Hidden Language Biases in ArtVQA
WACV 2024
CLIP in Mirror: Disentangling text from visual images through reflection
NIPS 2024
MmAP: Multi-Modal Alignment Prompt for Cross-Domain Multi-Task Learning
AAAI 2024
<
1
…
16
17
18
…
51
>