Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Computer Vision
›
Core AI
›
Multimodal Learning
1257 directly classified papers
Papers per year
2008: 1
2009: 2
2010: 2
2011: 1
2012: 3
2013: 3
2014: 2
2015: 5
2017: 11
2018: 25
2019: 33
2020: 66
2021: 47
2022: 113
2023: 199
2024: 325
2025: 411
2026: 8
Papers
Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training
CVPR 2025
Logits DeConfusion with CLIP for Few-Shot Learning
CVPR 2025
MNLP@DravidianLangTech 2025: Transformer-based Multimodal Framework for Misogyny Meme Detection
NAACL 2025
Dynamic Derivation and Elimination: Audio Visual Segmentation with Enhanced Audio Semantics
CVPR 2025
SEAL: Semantic Attention Learning for Long Video Representation
CVPR 2025
AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers
CVPR 2025
Exploring Contextual Attribute Density in Referring Expression Counting
CVPR 2025
HuMoCon: Concept Discovery for Human Motion Understanding
CVPR 2025
SoundVista: Novel-View Ambient Sound Synthesis via Visual-Acoustic Binding
CVPR 2025
GLASS: Guided Latent Slot Diffusion for Object-Centric Learning
CVPR 2025
3DEnhancer: Consistent Multi-View Diffusion for 3D Enhancement
CVPR 2025
Lux Post Facto: Learning Portrait Performance Relighting with Conditional Video Diffusion and a Hybrid Dataset
CVPR 2025
Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval
CVPR 2025
Less Attention is More: Prompt Transformer for Generalized Category Discovery
CVPR 2025
Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation
CVPR 2025
FSboard: Over 3 Million Characters of ASL Fingerspelling Collected via Smartphones
CVPR 2025
LIRM: Large Inverse Rendering Model for Progressive Reconstruction of Shape, Materials and View-dependent Radiance Fields
CVPR 2025
CASP: Consistency-aware Audio-induced Saliency Prediction Model for Omnidirectional Video
CVPR 2025
Ambiguity-aware Multi-level Incongruity Fusion Network for Multi-Modal Sarcasm Detection
COLING 2025
Cause-Effect Driven Optimization for Robust Medical Visual Question Answering with Language Biases
IJCAI 2025
A Multimodal Recaptioning Framework to Account for Perceptual Diversity Across Languages in Vision-Language Modeling
IJCNLP 2025
Screening, Rectifying, and Re-Screening: A Unified Framework for Tuning Vision-Language Models with Noisy Labels
IJCAI 2025
HerWILL@DravidianLangTech 2025: Ensemble Approach for Misogyny Detection in Memes Using Pre-trained Text and Vision Transformers
NAACL 2025
FlexGen: Flexible Multi-View Generation from Text and Image Inputs
ICCV 2025
Revisiting Multimodal Fusion for 3D Anomaly Detection from an Architectural Perspective
AAAI 2025
<
1
…
12
13
14
…
51
>