conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13,057 papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
A Comprehensive Study of Decoder-Only LLMs for Text-to-Image Generation
CVPR 2025
Deformable Radial Kernel Splatting
CVPR 2025
Post-pre-training for Modality Alignment in Vision-Language Foundation Models
CVPR 2025
Pseudo Visible Feature Fine-Grained Fusion for Thermal Object Detection
CVPR 2025
DiffPortrait360: Consistent Portrait Diffusion for 360 View Synthesis
CVPR 2025
NVILA: Efficient Frontier Visual Language Models
CVPR 2025
Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image Synthesis
CVPR 2025
RAP: Retrieval-Augmented Personalization for Multimodal Large Language Models
CVPR 2025
Incorporating Dense Knowledge Alignment into Unified Multimodal Representation Models
CVPR 2025
LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models
CVPR 2025
Learning to Highlight Audio by Watching Movies
CVPR 2025
WeGen: A Unified Model for Interactive Multimodal Generation as We Chat
CVPR 2025
HRAvatar: High-Quality and Relightable Gaussian Head Avatar
CVPR 2025
MagicQuill: An Intelligent Interactive Image Editing System
CVPR 2025
Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces
CVPR 2025
ProxyTransformation: Preshaping Point Cloud Manifold With Proxy Attention For 3D Visual Grounding
CVPR 2025
Beyond Words: Augmenting Discriminative Richness via Diffusions in Unsupervised Prompt Learning
CVPR 2025
Visual and Semantic Prompt Collaboration for Generalized Zero-Shot Learning
CVPR 2025
Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreaks
CVPR 2025
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
CVPR 2025
Human-centered Interactive Learning via MLLMs for Text-to-Image Person Re-identification
CVPR 2025
KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame Interpolation
CVPR 2025
Towards Natural Language-Based Document Image Retrieval: New Dataset and Benchmark
CVPR 2025
Exposure-slot: Exposure-centric Representations Learning with Slot-in-Slot Attention for Region-aware Exposure Correction
CVPR 2025
GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control
CVPR 2025
<
1
…
109
110
111
…
523
>