Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Multi-Modal Learning
1457 directly classified papers
Papers per year
2011: 1
2013: 4
2014: 3
2015: 3
2016: 9
2017: 11
2018: 27
2019: 61
2020: 109
2021: 87
2022: 153
2023: 213
2024: 391
2025: 384
2026: 1
Papers
Active Open-Vocabulary Recognition: Let Intelligent Moving Mitigate CLIP Limitations
CVPR 2024
Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles
CVPR 2024
Mask Grounding for Referring Image Segmentation
CVPR 2024
Frequency Spectrum Is More Effective for Multimodal Representation and Fusion: A Multimodal Spectrum Rumor Detector
AAAI 2024
Non-autoregressive Sequence-to-Sequence Vision-Language Models
CVPR 2024
A Hierarchical Network for Multimodal Document-Level Relation Extraction
AAAI 2024
UFineBench: Towards Text-based Person Retrieval with Ultra-fine Granularity
CVPR 2024
Generating Human Motion in 3D Scenes from Text Descriptions
CVPR 2024
NTO3D: Neural Target Object 3D Reconstruction with Segment Anything
CVPR 2024
SocialCounterfactuals: Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples
CVPR 2024
BOK-VQA: Bilingual outside Knowledge-Based Visual Question Answering via Graph Representation Pretraining
AAAI 2024
JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation
CVPR 2024
Debiasing Multimodal Sarcasm Detection with Contrastive Learning
AAAI 2024
Detours for Navigating Instructional Videos
CVPR 2024
On the Robustness of Large Multimodal Models Against Image Adversarial Attacks
CVPR 2024
Text2Loc: 3D Point Cloud Localization from Natural Language
CVPR 2024
Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval
AAAI 2024
RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation
CVPR 2024
Detecting and Preventing Hallucinations in Large Vision Language Models
AAAI 2024
Can I Trust Your Answer? Visually Grounded Video Question Answering
CVPR 2024
Can't Make an Omelette Without Breaking Some Eggs: Plausible Action Anticipation Using Large Video-Language Models
CVPR 2024
Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions
CVPR 2024
Unified Language-driven Zero-shot Domain Adaptation
CVPR 2024
MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
CVPR 2024
MM-TTS: Multi-Modal Prompt Based Style Transfer for Expressive Text-to-Speech Synthesis
AAAI 2024
<
1
…
20
21
22
…
59
>