conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13,057 papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
GigaHands: A Massive Annotated Dataset of Bimanual Hand Activities
CVPR 2025
Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering
CVPR 2025
FrugalNeRF: Fast Convergence for Extreme Few-shot Novel View Synthesis without Learned Priors
CVPR 2025
OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation
CVPR 2025
Embodied Scene Understanding for Vision Language Models via MetaVQA
CVPR 2025
Assessing and Learning Alignment of Unimodal Vision and Language Models
CVPR 2025
PAVE: Patching and Adapting Video Large Language Models
CVPR 2025
AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion
CVPR 2025
The Photographer's Eye: Teaching Multimodal Large Language Models to See, and Critique Like Photographers
CVPR 2025
Revisiting Audio-Visual Segmentation with Vision-Centric Transformer
CVPR 2025
HOIGPT: Learning Long-Sequence Hand-Object Interaction with Language Models
CVPR 2025
DPC: Dual-Prompt Collaboration for Tuning Vision-Language Models
CVPR 2025
Dual-Granularity Semantic Guided Sparse Routing Diffusion Model for General Pansharpening
CVPR 2025
OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts
CVPR 2025
Taxonomy-Aware Evaluation of Vision-Language Models
CVPR 2025
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
CVPR 2025
Video Language Model Pretraining with Spatio-temporal Masking
CVPR 2025
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
CVPR 2025
TAPT: Test-Time Adversarial Prompt Tuning for Robust Inference in Vision-Language Models
CVPR 2025
Active Data Curation Effectively Distills Large-Scale Multimodal Models
CVPR 2025
SplineGS: Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians from Monocular Video
CVPR 2025
Multi-modal Knowledge Distillation-based Human Trajectory Forecasting
CVPR 2025
GaussianIP: Identity-Preserving Realistic 3D Human Generation via Human-Centric Diffusion Prior
CVPR 2025
Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering Incorrectly
CVPR 2025
Object-Shot Enhanced Grounding Network for Egocentric Video
CVPR 2025
<
1
…
99
100
101
…
523
>