Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Computer Vision
›
Core AI
›
Multimodal Learning
1257 directly classified papers
Papers per year
2008: 1
2009: 2
2010: 2
2011: 1
2012: 3
2013: 3
2014: 2
2015: 5
2017: 11
2018: 25
2019: 33
2020: 66
2021: 47
2022: 113
2023: 199
2024: 325
2025: 411
2026: 8
Papers
EgoSonics: Generating Synchronized Audio for Silent Egocentric Videos
WACV 2025
Aligning Text/Speech Representations from Multimodal Models with MEG Brain Activity During Listening
EMNLP 2025
VIIS: Visible and Infrared Information Synthesis for Severe Low-Light Image Enhancement
WACV 2025
Misogynistic Meme Detection in Dravidian Languages Using Kolmogorov Arnold-based Networks
NAACL 2025
MMOne: Representing Multiple Modalities in One Scene
ICCV 2025
ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations
ICCV 2025
ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild
COLING 2025
TaiwanVQA: A Benchmark for Visual Question Answering for Taiwanese Daily Life
COLING 2025
OVQA: A Dataset for Visual Question Answering and Multimodal Research in Odia Language
COLING 2025
Understanding Figurative Meaning through Explainable Visual Entailment
NAACL 2025
Intrinsic Bias is Predicted by Pretraining Data and Correlates with Downstream Performance in Vision-Language Encoders
NAACL 2025
MNLP@DravidianLangTech 2025: A Deep Multimodal Neural Network for Hate Speech Detection in Dravidian Languages
NAACL 2025
AKiRa: Augmentation Kit on Rays for Optical Video Generation
CVPR 2025
Evaluating Model Perception of Color Illusions in Photorealistic Scenes
CVPR 2025
InnovationEngineers@DravidianLangTech 2025: Enhanced CNN Models for Detecting Misogyny in Tamil Memes Using Image and Text Classification
NAACL 2025
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
CVPR 2025
PhD: A ChatGPT-Prompted Visual Hallucination Evaluation Dataset
CVPR 2025
DViN: Dynamic Visual Routing Network for Weakly Supervised Referring Expression Comprehension
CVPR 2025
MUST: The First Dataset and Unified Framework for Multispectral UAV Single Object Tracking
CVPR 2025
MeGA: Hybrid Mesh-Gaussian Head Avatar for High-Fidelity Rendering and Head Editing
CVPR 2025
TSAM: Temporal SAM Augmented with Multimodal Prompts for Referring Audio-Visual Segmentation
CVPR 2025
MaRI: Material Retrieval Integration across Domains
CVPR 2025
Ges3ViG : Incorporating Pointing Gestures into Language-Based 3D Visual Grounding for Embodied Reference Understanding
CVPR 2025
Semantic and Sequential Alignment for Referring Video Object Segmentation
CVPR 2025
Learning Beyond Still Frames: Scaling Vision-Language Models with Video
ICCV 2025
<
1
…
11
12
13
…
51
>