Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Computer Vision
›
Core AI
›
Multimodal Learning
1257 directly classified papers
Papers per year
2008: 1
2009: 2
2010: 2
2011: 1
2012: 3
2013: 3
2014: 2
2015: 5
2017: 11
2018: 25
2019: 33
2020: 66
2021: 47
2022: 113
2023: 199
2024: 325
2025: 411
2026: 8
Papers
Assessing News Thumbnail Representativeness: Counterfactual text can enhance the cross-modal matching ability
ACL 2024
SURf: Teaching Large Vision-Language Models to Selectively Utilize Retrieved Information
EMNLP 2024
MEANT: Multimodal Encoder for Antecedent Information
EMNLP 2024
If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions
EMNLP 2024
CommVQA: Situating Visual Question Answering in Communicative Contexts
EMNLP 2024
VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models
EMNLP 2024
Multi-Level Information Retrieval Augmented Generation for Knowledge-based Visual Question Answering
EMNLP 2024
Nearest Neighbor Normalization Improves Multimodal Retrieval
EMNLP 2024
Mitigating Open-Vocabulary Caption Hallucinations
EMNLP 2024
Multiple Knowledge-Enhanced Interactive Graph Network for Multimodal Conversational Emotion Recognition
EMNLP 2024
MVP-Bench: Can Large Vision-Language Models Conduct Multi-level Visual Perception Like Humans?
EMNLP 2024
Semantic Token Reweighting for Interpretable and Controllable Text Embeddings in CLIP
EMNLP 2024
Individuation in Neural Models with and without Visual Grounding
EMNLP 2024
Retrieval Evaluation for Long-Form and Knowledge-Intensive Image–Text Article Composition
EMNLP 2024
Benchmarking Visually-Situated Translation of Text in Natural Images
EMNLP 2024
Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training
CVPR 2024
Domain Prompt Learning with Quaternion Networks
CVPR 2024
Make Pixels Dance: High-Dynamic Video Generation
CVPR 2024
Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction Anticipation
CVPR 2024
Aligning and Prompting Everything All at Once for Universal Visual Perception
CVPR 2024
Matching Anything by Segmenting Anything
CVPR 2024
ConCon-Chi: Concept-Context Chimera Benchmark for Personalized Vision-Language Tasks
CVPR 2024
RELI11D: A Comprehensive Multimodal Human Motion Dataset and Method
CVPR 2024
MoDE: CLIP Data Experts via Clustering
CVPR 2024
Relightful Harmonization: Lighting-aware Portrait Background Replacement
CVPR 2024
<
1
…
25
26
27
…
51
>