conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13,057 papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
Controllable Human Image Generation with Personalized Multi-Garments
CVPR 2025
FineLIP: Extending CLIP's Reach via Fine-Grained Alignment with Longer Text Inputs
CVPR 2025
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
CVPR 2025
Video-Guided Foley Sound Generation with Multimodal Controls
CVPR 2025
StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements
CVPR 2025
CTRL-O: Language-Controllable Object-Centric Visual Representation Learning
CVPR 2025
Text Augmented Correlation Transformer For Few-shot Classification & Segmentation
CVPR 2025
ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models
CVPR 2025
PreciseCam: Precise Camera Control for Text-to-Image Generation
CVPR 2025
g3D-LF: Generalizable 3D-Language Feature Fields for Embodied Tasks
CVPR 2025
Temporal Action Detection Model Compression by Progressive Block Drop
CVPR 2025
DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment
CVPR 2025
STEP: Enhancing Video-LLMs' Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training
CVPR 2025
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows
CVPR 2025
PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
CVPR 2025
AutoPresent: Designing Structured Visuals from Scratch
CVPR 2025
VisionArena: 230k Real World User-VLM Conversations with Preference Labels
CVPR 2025
SemGeoMo: Dynamic Contextual Human Motion Generation with Semantic and Geometric Guidance
CVPR 2025
FreeScene: Mixed Graph Diffusion for 3D Scene Synthesis from Free Prompts
CVPR 2025
FactCheXcker: Mitigating Measurement Hallucinations in Chest X-ray Report Generation Models
CVPR 2025
UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing
CVPR 2025
Reasoning to Attend: Try to Understand How <SEG> Token Works
CVPR 2025
DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution
CVPR 2025
JTD-UAV: MLLM-Enhanced Joint Tracking and Description Framework for Anti-UAV Systems
CVPR 2025
Font-Agent: Enhancing Font Understanding with Large Language Models
CVPR 2025
<
1
…
102
103
104
…
523
>