Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Computer Vision
›
Core AI
›
Multimodal Learning
1257 directly classified papers
Papers per year
2008: 1
2009: 2
2010: 2
2011: 1
2012: 3
2013: 3
2014: 2
2015: 5
2017: 11
2018: 25
2019: 33
2020: 66
2021: 47
2022: 113
2023: 199
2024: 325
2025: 411
2026: 8
Papers
MAR: Matching-Augmented Reasoning for Enhancing Visual-based Entity Question Answering
EMNLP 2024
Individuation in Neural Models with and without Visual Grounding
EMNLP 2024
Semantic Token Reweighting for Interpretable and Controllable Text Embeddings in CLIP
EMNLP 2024
Retrieval Evaluation for Long-Form and Knowledge-Intensive Image–Text Article Composition
EMNLP 2024
Multiple Knowledge-Enhanced Interactive Graph Network for Multimodal Conversational Emotion Recognition
EMNLP 2024
Mitigating Open-Vocabulary Caption Hallucinations
EMNLP 2024
MVP-Bench: Can Large Vision-Language Models Conduct Multi-level Visual Perception Like Humans?
EMNLP 2024
Benchmarking Visually-Situated Translation of Text in Natural Images
EMNLP 2024
VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models
EMNLP 2024
CommVQA: Situating Visual Question Answering in Communicative Contexts
EMNLP 2024
Multi-Level Information Retrieval Augmented Generation for Knowledge-based Visual Question Answering
EMNLP 2024
If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions
EMNLP 2024
Nearest Neighbor Normalization Improves Multimodal Retrieval
EMNLP 2024
Look before You Leap: Dual Logical Verification for Knowledge-based Visual Question Generation
COLING 2024
Multi-Modal Gaze Following in Conversational Scenarios
WACV 2024
SURf: Teaching Large Vision-Language Models to Selectively Utilize Retrieved Information
EMNLP 2024
LANS: A Layout-Aware Neural Solver for Plane Geometry Problem
ACL 2024
VisDiaHalBench: A Visual Dialogue Benchmark For Diagnosing Hallucination in Large Vision-Language Models
ACL 2024
Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives
ACL 2024
Generative Cross-Modal Retrieval: Memorizing Images in Multimodal Language Models for Retrieval and Beyond
ACL 2024
Browse and Concentrate: Comprehending Multimodal Content via Prior-LLM Context Fusion
ACL 2024
Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language Models
ACL 2024
ChartAssistant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning
ACL 2024
VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval
ACL 2024
Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach
ACL 2024
<
1
…
23
24
25
…
51
>