conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13,057 papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning
CVPR 2025
Free-viewpoint Human Animation with Pose-correlated Reference Selection
CVPR 2025
Empowering Vector Graphics with Consistently Arbitrary Viewing and View-dependent Visibility
CVPR 2025
Semantic and Expressive Variations in Image Captions Across Languages
CVPR 2025
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
CVPR 2025
Mono3DVLT: Monocular-Video-Based 3D Visual Language Tracking
CVPR 2025
Commonsense Video Question Answering through Video-Grounded Entailment Tree Reasoning
CVPR 2025
Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts
CVPR 2025
HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation
CVPR 2025
3D Gaussian Inpainting with Depth-Guided Cross-View Consistency
CVPR 2025
Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding
CVPR 2025
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale
CVPR 2025
VladVA: Discriminative Fine-tuning of LVLMs
CVPR 2025
BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models
CVPR 2025
HyperGLM: HyperGraph for Video Scene Graph Generation and Anticipation
CVPR 2025
AnySat: One Earth Observation Model for Many Resolutions, Scales, and Modalities
CVPR 2025
OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
CVPR 2025
AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-modal Alignment
CVPR 2025
VideoComp: Advancing Fine-Grained Compositional and Temporal Alignment in Video-Text Models
CVPR 2025
Can Text-to-Video Generation help Video-Language Alignment?
CVPR 2025
LookCloser: Frequency-aware Radiance Field for Tiny-Detail Scene
CVPR 2025
MIMO: A Medical Vision Language Model with Visual Referring Multimodal Input and Pixel Grounding Multimodal Output
CVPR 2025
DreamTrack: Dreaming the Future for Multimodal Visual Object Tracking
CVPR 2025
Reproducible Vision-Language Models Meet Concepts Out of Pre-Training
CVPR 2025
Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation
CVPR 2025
<
1
…
93
94
95
…
523
>