conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13,057 papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
CVPR 2025
STOP: Integrated Spatial-Temporal Dynamic Prompting for Video Understanding
CVPR 2025
MEGA: Masked Generative Autoencoder for Human Mesh Recovery
CVPR 2025
Incomplete Multi-modal Brain Tumor Segmentation via Learnable Sorting State Space Model
CVPR 2025
FreeUV: Ground-Truth-Free Realistic Facial UV Texture Recovery via Cross-Assembly Inference Strategy
CVPR 2025
HarmonySet: A Comprehensive Dataset for Understanding Video-Music Semantic Alignment and Temporal Synchronization
CVPR 2025
ClearSight: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large Language Models
CVPR 2025
DesignDiffusion: High-Quality Text-to-Design Image Generation with Diffusion Models
CVPR 2025
VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsification
CVPR 2025
Instruction-based Image Manipulation by Watching How Things Move
CVPR 2025
VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
CVPR 2025
SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE
CVPR 2025
From Elements to Design: A Layered Approach for Automatic Graphic Design Composition
CVPR 2025
Notes-guided MLLM Reasoning: Enhancing MLLM with Knowledge and Visual Notes for Visual Question Answering
CVPR 2025
iSegMan: Interactive Segment-and-Manipulate 3D Gaussians
CVPR 2025
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices
CVPR 2025
MatAnyone: Stable Video Matting with Consistent Memory Propagation
CVPR 2025
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
CVPR 2025
DriveGPT4-V2: Harnessing Large Language Model Capabilities for Enhanced Closed-Loop Autonomous Driving
CVPR 2025
Functionality Understanding and Segmentation in 3D Scenes
CVPR 2025
MMTL-UniAD: A Unified Framework for Multimodal and Multi-Task Learning in Assistive Driving Perception
CVPR 2025
T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation
CVPR 2025
Self-Evolving Visual Concept Library using Vision-Language Critics
CVPR 2025
Multimodal Autoregressive Pre-training of Large Vision Encoders
CVPR 2025
Bridging the Vision-Brain Gap with an Uncertainty-Aware Blur Prior
CVPR 2025
<
1
…
92
93
94
…
523
>