Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Computer Vision
›
Core AI
›
Multimodal Learning
1257 directly classified papers
Papers per year
2008: 1
2009: 2
2010: 2
2011: 1
2012: 3
2013: 3
2014: 2
2015: 5
2017: 11
2018: 25
2019: 33
2020: 66
2021: 47
2022: 113
2023: 199
2024: 325
2025: 411
2026: 8
Papers
Evaluating Visual and Cultural Interpretation: The K-Viscuit Benchmark with Human-VLM Collaboration
ACL 2025
Unmasking Deceptive Visuals: Benchmarking Multimodal Large Language Models on Misleading Chart Question Answering
EMNLP 2025
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
CVPR 2025
TVQACML: Benchmarking Text-Centric Visual Question Answering in Multilingual Chinese Minority Languages
EMNLP 2025
Enhancing Multimodal Retrieval via Complementary Information Extraction and Alignment
ACL 2025
Transparent and Coherent Procedural Mistake Detection
EMNLP 2025
Learning to Highlight Audio by Watching Movies
CVPR 2025
Advancing Fine-Grained Visual Understanding with Multi-Scale Alignment in Multi-Modal Models
EMNLP 2025
InstructPart: Task-Oriented Part Segmentation with Instruction Reasoning
ACL 2025
SHARP: Steering Hallucination in LVLMs via Representation Engineering
EMNLP 2025
LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes
CVPR 2025
VRoPE: Rotary Position Embedding for Video Large Language Models
EMNLP 2025
OMGM: Orchestrate Multiple Granularities and Modalities for Efficient Multimodal Retrieval
ACL 2025
WISE: Weak-Supervision-Guided Step-by-Step Explanations for Multimodal LLMs in Image Classification
EMNLP 2025
AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM
CVPR 2025
PhD: A ChatGPT-Prompted Visual Hallucination Evaluation Dataset
CVPR 2025
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
CVPR 2025
DViN: Dynamic Visual Routing Network for Weakly Supervised Referring Expression Comprehension
CVPR 2025
MUST: The First Dataset and Unified Framework for Multispectral UAV Single Object Tracking
CVPR 2025
Keep the Balance: A Parameter-Efficient Symmetrical Framework for RGB+X Semantic Segmentation
CVPR 2025
UniRestore: Unified Perceptual and Task-Oriented Image Restoration Model Using Diffusion Prior
CVPR 2025
AKiRa: Augmentation Kit on Rays for Optical Video Generation
CVPR 2025
TSAM: Temporal SAM Augmented with Multimodal Prompts for Referring Audio-Visual Segmentation
CVPR 2025
MeGA: Hybrid Mesh-Gaussian Head Avatar for High-Fidelity Rendering and Head Editing
CVPR 2025
Exploring How Generative MLLMs Perceive More Than CLIP with the Same Vision Encoder
ACL 2025
<
1
…
15
16
17
…
51
>