conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13,057 papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding
CVPR 2025
Online Video Understanding: OVBench and VideoChat-Online
CVPR 2025
MERGE: Multi-faceted Hierarchical Graph-based GNN for Gene Expression Prediction from Whole Slide Histopathology Images
CVPR 2025
Effective SAM Combination for Open-Vocabulary Semantic Segmentation
CVPR 2025
Bridging Modalities: Improving Universal Multimodal Retrieval by Multimodal Large Language Models
CVPR 2025
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Models
CVPR 2025
PTDiffusion: Free Lunch for Generating Optical Illusion Hidden Pictures with Phase-Transferred Diffusion Model
CVPR 2025
InsightEdit: Towards Better Instruction Following for Image Editing
CVPR 2025
Turbo3D: Ultra-fast Text-to-3D Generation
CVPR 2025
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
CVPR 2025
Visual Lexicon: Rich Image Features in Language Space
CVPR 2025
Do We Really Need Curated Malicious Data for Safety Alignment in Multi-modal Large Language Models?
CVPR 2025
EchoONE: Segmenting Multiple Echocardiography Planes in One Model
CVPR 2025
Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception
CVPR 2025
Generating 6DoF Object Manipulation Trajectories from Action Description in Egocentric Vision
CVPR 2025
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature
CVPR 2025
Contextual AD Narration with Interleaved Multimodal Sequence
CVPR 2025
FIFA: Fine-grained Inter-frame Attention for Driver's Video Gaze Estimation
CVPR 2025
FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question Answering
CVPR 2025
LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences
CVPR 2025
V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts
CVPR 2025
CrossOver: 3D Scene Cross-Modal Alignment
CVPR 2025
Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents
CVPR 2025
Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation
CVPR 2025
Style-Editor: Text-driven Object-centric Style Editing
CVPR 2025
<
1
…
107
108
109
…
523
>