conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13,057 papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
M^3-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation
CVPR 2025
Everything to the Synthetic: Diffusion-driven Test-time Adaptation via Synthetic-Domain Alignment
CVPR 2025
Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models
CVPR 2025
Devil is in the Detail: Towards Injecting Fine Details of Image Prompt in Image Generation via Conflict-free Guidance and Stratified Attention
CVPR 2025
SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization
CVPR 2025
Stealthy Backdoor Attack in Self-Supervised Learning Vision Encoders for Large Vision Language Models
CVPR 2025
Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization
CVPR 2025
SAM2-LOVE: Segment Anything Model 2 in Language-aided Audio-Visual Scenes
CVPR 2025
SynTab-LLaVA: Enhancing Multimodal Table Understanding with Decoupled Synthesis
CVPR 2025
DrVideo: Document Retrieval Based Long Video Understanding
CVPR 2025
PSHuman: Photorealistic Single-image 3D Human Reconstruction using Cross-Scale Multiview Diffusion and Explicit Remeshing
CVPR 2025
DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding
CVPR 2025
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles
CVPR 2025
IDOL: Instant Photorealistic 3D Human Creation from a Single Image
CVPR 2025
PartRM: Modeling Part-Level Dynamics with Large Cross-State Reconstruction Model
CVPR 2025
SeCap: Self-Calibrating and Adaptive Prompts for Cross-view Person Re-Identification in Aerial-Ground Networks
CVPR 2025
SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters
CVPR 2025
Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding
CVPR 2025
SALAD: Skeleton-aware Latent Diffusion for Text-driven Motion Generation and Editing
CVPR 2025
DiN: Diffusion Model for Robust Medical VQA with Semantic Noisy Labels
CVPR 2025
ShowMak3r: Compositional TV Show Reconstruction
CVPR 2025
FSBench: A Figure Skating Benchmark for Advancing Artistic Sports Understanding
CVPR 2025
VideoDirector: Precise Video Editing via Text-to-Video Models
CVPR 2025
LLM-driven Multimodal and Multi-Identity Listening Head Generation
CVPR 2025
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation
CVPR 2025
<
1
…
91
92
93
…
523
>