conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13,057 papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model
CVPR 2025
On the Zero-shot Adversarial Robustness of Vision-Language Models: A Truly Zero-shot and Training-free Approach
CVPR 2025
Towards General Visual-Linguistic Face Forgery Detection
CVPR 2025
Movie Weaver: Tuning-Free Multi-Concept Video Personalization with Anchored Prompts
CVPR 2025
LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos
CVPR 2025
Seeing More with Less: Human-like Representations in Vision Models
CVPR 2025
Modeling Thousands of Human Annotators for Generalizable Text-to-Image Person Re-identification
CVPR 2025
Accelerating Multimodal Large Language Models by Searching Optimal Vision Token Reduction
CVPR 2025
Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key
CVPR 2025
Localizing Events in Videos with Multimodal Queries
CVPR 2025
PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability
CVPR 2025
Language-Guided Salient Object Ranking
CVPR 2025
Towards More General Video-based Deepfake Detection through Facial Component Guided Adaptation for Foundation Model
CVPR 2025
SP3D: Boosting Sparsely-Supervised 3D Object Detection via Accurate Cross-Modal Semantic Prompts
CVPR 2025
VoCo-LLaMA: Towards Vision Compression with Large Language Models
CVPR 2025
XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery?
CVPR 2025
BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding
CVPR 2025
Towards All-in-One Medical Image Re-Identification
CVPR 2025
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories
CVPR 2025
HeatFormer: A Neural Optimizer for Multiview Human Mesh Recovery
CVPR 2025
ResCLIP: Residual Attention for Training-free Dense Vision-language Inference
CVPR 2025
CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology
CVPR 2025
MVGenMaster: Scaling Multi-View Generation from Any Image via 3D Priors Enhanced Diffusion Model
CVPR 2025
HSI-GPT: A General-Purpose Large Scene-Motion-Language Model for Human Scene Interaction
CVPR 2025
Vid2Avatar-Pro: Authentic Avatar from Videos in the Wild via Universal Prior
CVPR 2025
<
1
…
97
98
99
…
523
>