conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13,057 papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
The Scene Language: Representing Scenes with Programs, Words, and Embeddings
CVPR 2025
EmoEdit: Evoking Emotions through Image Manipulation
CVPR 2025
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
CVPR 2025
Video-Bench: Human-Aligned Video Generation Benchmark
CVPR 2025
Anyattack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models
CVPR 2025
VinaBench: Benchmark for Faithful and Consistent Visual Narratives
CVPR 2025
ATA: Adaptive Transformation Agent for Text-Guided Subject-Position Variable Background Inpainting
CVPR 2025
Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception
CVPR 2025
Language-Guided Audio-Visual Learning for Long-Term Sports Assessment
CVPR 2025
Dual Diffusion for Unified Image Generation and Understanding
CVPR 2025
Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation
CVPR 2025
COSMIC: Clique-Oriented Semantic Multi-space Integration for Robust CLIP Test-Time Adaptation
CVPR 2025
SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving
CVPR 2025
EgoLife: Towards Egocentric Life Assistant
CVPR 2025
Cross-Modal 3D Representation with Multi-View Images and Point Clouds
CVPR 2025
Teaching Large Language Models to Regress Accurate Image Quality Scores Using Score Distribution
CVPR 2025
Lifting the Veil on Visual Information Flow in MLLMs: Unlocking Pathways to Faster Inference
CVPR 2025
It's a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data
CVPR 2025
Flexible Frame Selection for Efficient Video Reasoning
CVPR 2025
EventGPT: Event Stream Understanding with Multimodal Large Language Models
CVPR 2025
Not Only Text: Exploring Compositionality of Visual Representations in Vision-Language Models
CVPR 2025
HOP: Heterogeneous Topology-based Multimodal Entanglement for Co-Speech Gesture Generation
CVPR 2025
Number it: Temporal Grounding Videos like Flipping Manga
CVPR 2025
PromptHMR: Promptable Human Mesh Recovery
CVPR 2025
STPro: Spatial and Temporal Progressive Learning for Weakly Supervised Spatio-Temporal Grounding
CVPR 2025
<
1
…
95
96
97
…
523
>