conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13,057 papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
ParaHome: Parameterizing Everyday Home Activities Towards 3D Generative Modeling of Human-Object Interactions
CVPR 2025
Adapting to the Unknown: Training-Free Audio-Visual Event Perception with Dynamic Thresholds
CVPR 2025
Magma: A Foundation Model for Multimodal AI Agents
CVPR 2025
Object-aware Sound Source Localization via Audio-Visual Scene Understanding
CVPR 2025
SerialGen: Personalized Image Generation by First Standardization Then Personalization
CVPR 2025
Matrix3D: Large Photogrammetry Model All-in-One
CVPR 2025
Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation
CVPR 2025
MarkushGrapher: Joint Visual and Textual Recognition of Markush Structures
CVPR 2025
ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models
CVPR 2025
Sound Bridge: Associating Egocentric and Exocentric Videos via Audio Cues
CVPR 2025
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations
CVPR 2025
LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models
CVPR 2025
Point Clouds Meets Physics: Dynamic Acoustic Field Fitting Network for Point Cloud Understanding
CVPR 2025
Robust Message Embedding via Attention Flow-Based Steganography
CVPR 2025
LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant
CVPR 2025
CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation
CVPR 2025
ChatHuman: Chatting about 3D Humans with Tools
CVPR 2025
Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval
CVPR 2025
DynPose: Largely Improving the Efficiency of Human Pose Estimation by a Simple Dynamic Framework
CVPR 2025
VideoGLaMM : A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
CVPR 2025
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
CVPR 2025
Multi-modal Contrastive Learning with Negative Sampling Calibration for Phenotypic Drug Discovery
CVPR 2025
MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments
CVPR 2025
Boltzmann Attention Sampling for Image Analysis with Small Objects
CVPR 2025
Reconstructing Animals and the Wild
CVPR 2025
<
1
…
106
107
108
…
523
>