Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Computer Vision
›
Core AI
›
Multimodal Learning
1257 directly classified papers
Papers per year
2008: 1
2009: 2
2010: 2
2011: 1
2012: 3
2013: 3
2014: 2
2015: 5
2017: 11
2018: 25
2019: 33
2020: 66
2021: 47
2022: 113
2023: 199
2024: 325
2025: 411
2026: 8
Papers
Mitigating Open-Vocabulary Caption Hallucinations
EMNLP 2024
MVP-Bench: Can Large Vision-Language Models Conduct Multi-level Visual Perception Like Humans?
EMNLP 2024
SURf: Teaching Large Vision-Language Models to Selectively Utilize Retrieved Information
EMNLP 2024
MEANT: Multimodal Encoder for Antecedent Information
EMNLP 2024
If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions
EMNLP 2024
LANS: A Layout-Aware Neural Solver for Plane Geometry Problem
ACL 2024
VisDiaHalBench: A Visual Dialogue Benchmark For Diagnosing Hallucination in Large Vision-Language Models
ACL 2024
Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives
ACL 2024
Generative Cross-Modal Retrieval: Memorizing Images in Multimodal Language Models for Retrieval and Beyond
ACL 2024
I-AI: A Controllable & Interpretable AI System for Decoding Radiologists' Intense Focus for Accurate CXR Diagnoses
WACV 2024
Beyond Fusion: Modality Hallucination-Based Multispectral Fusion for Pedestrian Detection
WACV 2024
Browse and Concentrate: Comprehending Multimodal Content via Prior-LLM Context Fusion
ACL 2024
Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language Models
ACL 2024
ChartAssistant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning
ACL 2024
Single Frame Semantic Segmentation Using Multi-Modal Spherical Images
WACV 2024
Deep Visual-Genetic Biometrics for Taxonomic Classification of Rare Species
WACV 2024
VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval
ACL 2024
Benchmarking Out-of-Distribution Detection in Visual Question Answering
WACV 2024
Leveraging Next-Active Objects for Context-Aware Anticipation in Egocentric Videos
WACV 2024
Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach
ACL 2024
Muffin or Chihuahua? Challenging Multimodal Large Language Models with Multipanel VQA
ACL 2024
Depth From Asymmetric Frame-Event Stereo: A Divide-and-Conquer Approach
WACV 2024
LAVSS: Location-Guided Audio-Visual Spatial Audio Separation
WACV 2024
Relightful Harmonization: Lighting-aware Portrait Background Replacement
CVPR 2024
Expedited Training of Visual Conditioned Language Generation via Redundancy Reduction
ACL 2024
<
1
…
24
25
26
…
51
>