Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
ICCV 2025
Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation
ICCV 2025
Everything is a Video: Unifying Modalities through Next-Frame Prediction
ICCV 2025
Counting Stacked Objects
ICCV 2025
YOLO-Count: Differentiable Object Counting for Text-to-Image Generation
ICCV 2025
OrderChain: Towards General Instruct-Tuning for Stimulating the Ordinal Understanding Ability of MLLM
ICCV 2025
Fix-CLIP: Dual-Branch Hierarchical Contrastive Learning via Synthetic Captions for Better Understanding of Long Text
ICCV 2025
Benchmarking Multimodal Large Language Models Against Image Corruptions
ICCV 2025
Balancing Task-invariant Interaction and Task-specific Adaptation for Unified Image Fusion
ICCV 2025
What's Making That Sound Right Now? Video-centric Audio-Visual Localization
ICCV 2025
PASTA: Part-Aware Sketch-to-3D Shape Generation with Text-Aligned Prior
ICCV 2025
BASIC: Boosting Visual Alignment with Intrinsic Refined Embeddings in Multimodal Large Language Models
ICCV 2025
Music-Aligned Holistic 3D Dance Generation via Hierarchical Motion Modeling
ICCV 2025
FOLDER: Accelerating Multi-Modal Large Language Models with Enhanced Performance
ICCV 2025
Reducing Unimodal Bias in Multi-Modal Semantic Segmentation with Multi-Scale Functional Entropy Regularization
ICCV 2025
SemTalk: Holistic Co-speech Motion Generation with Frame-level Semantic Emphasis
ICCV 2025
Boosting Multimodal Learning via Disentangled Gradient Learning
ICCV 2025
Instruction-Grounded Visual Projectors for Continual Learning of Generative Vision-Language Models
ICCV 2025
DynImg: Key Frames with Visual Prompts are Good Representation for Multi-Modal Video Understanding
ICCV 2025
Geminio: Language-Guided Gradient Inversion Attacks in Federated Learning
ICCV 2025
SAMPLE: Semantic Alignment through Temporal-Adaptive Multimodal Prompt Learning for Event-Based Open-Vocabulary Action Recognition
ICCV 2025
Scaling Omni-modal Pretraining with Multimodal Context: Advancing Universal Representation Learning Across Modalities
ICCV 2025
Enhancing Few-Shot Vision-Language Classification with Large Multimodal Model Features
ICCV 2025
FedMVP: Federated Multimodal Visual Prompt Tuning for Vision-Language Models
ICCV 2025
HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals
EMNLP 2025
<
1
…
11
12
13
…
128
>