Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
3D Part Segmentation via Geometric Aggregation of 2D Visual Features
WACV 2025
ChartPoint: Guiding MLLMs with Grounding Reflection for Chart Reasoning
ICCV 2025
Feature Design for Bridging SAM and CLIP toward Referring Image Segmentation
WACV 2025
ReMP-AD: Retrieval-enhanced Multi-modal Prompt Fusion for Few-Shot Industrial Visual Anomaly Detection
ICCV 2025
Persian in a Court: Benchmarking VLMs In Persian Multi-Modal Tasks
COLING 2025
V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
ICCV 2025
Benchmarking Multimodal Large Language Models Against Image Corruptions
ICCV 2025
Instruction-Grounded Visual Projectors for Continual Learning of Generative Vision-Language Models
ICCV 2025
If I feel smart, I will do the right thing: Combining Complementary Multimodal Information in Visual Language Models
COLING 2025
SAMPLE: Semantic Alignment through Temporal-Adaptive Multimodal Prompt Learning for Event-Based Open-Vocabulary Action Recognition
ICCV 2025
MDSBots@NLU of Devanagari Script Languages 2025: Detection of Language, Hate Speech, and Targets using MURTweet
COLING 2025
Scaling Language-Free Visual Representation Learning
ICCV 2025
Narrating the Video: Boosting Text-Video Retrieval via Comprehensive Utilization of Frame-Level Captions
CVPR 2025
CLIP-PCQA: Exploring Subjective-Aligned Vision-Language Modeling for Point Cloud Quality Assessment
AAAI 2025
UIPro: Unleashing Superior Interaction Capability For GUI Agents
ICCV 2025
Geminio: Language-Guided Gradient Inversion Attacks in Federated Learning
ICCV 2025
CryoDomain: Sequence-free Protein Domain Identification from Low-resolution Cryo-EM Density Maps
AAAI 2025
SketchAgent: Generating Structured Diagrams from Hand-Drawn Sketches
IJCAI 2025
Multi-to-Single: Reducing Multimodal Dependency in Emotion Recognition Through Contrastive Learning
AAAI 2025
A Cross-Modal Densely Guided Knowledge Distillation Based on Modality Rebalancing Strategy for Enhanced Unimodal Emotion Recognition
IJCAI 2025
Tri-Ergon: Fine-Grained Video-to-Audio Generation with Multi-Modal Conditions and LUFS Control
AAAI 2025
BMIP: Bi-directional Modality Interaction Prompt Learning for VLM
IJCAI 2025
Asymmetric Visual Semantic Embedding Framework for Efficient Vision-Language Alignment
AAAI 2025
Connecting Giants: Synergistic Knowledge Transfer of Large Multimodal Models for Few-Shot Learning
IJCAI 2025
Multi-fingered Hand Grasps with Visuo-Tactile Fusion via Multi-Agent Deep Reinforcement Learning
AAAI 2025
<
1
…
8
9
10
…
128
>