conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13,057 papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
EVPGS: Enhanced View Prior Guidance for Splatting-based Extrapolated View Synthesis
CVPR 2025
GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding
CVPR 2025
A3: Few-shot Prompt Learning of Unlearnable Examples with Cross-Modal Adversarial Feature Alignment
CVPR 2025
MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal Representations
CVPR 2025
UWAV: Uncertainty-weighted Weakly-supervised Audio-Visual Video Parsing
CVPR 2025
Mosaic of Modalities: A Comprehensive Benchmark for Multimodal Graph Learning
CVPR 2025
FirePlace: Geometric Refinements of LLM Common Sense Reasoning for 3D Object Placement
CVPR 2025
ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding
CVPR 2025
Birth and Death of a Rose
CVPR 2025
SoundVista: Novel-View Ambient Sound Synthesis via Visual-Acoustic Binding
CVPR 2025
CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object Representation
CVPR 2025
MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation
CVPR 2025
Bridging Gait Recognition and Large Language Models Sequence Modeling
CVPR 2025
Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought
CVPR 2025
COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models under Distribution Shifts
CVPR 2025
Retrieving Semantics from the Deep: an RAG Solution for Gesture Synthesis
CVPR 2025
Towards a Universal Synthetic Video Detector: From Face or Background Manipulations to Fully AI-Generated Content
CVPR 2025
TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation
CVPR 2025
Hierarchical Knowledge Prompt Tuning for Multi-task Test-Time Adaptation
CVPR 2025
Cross-Modal Interactive Perception Network with Mamba for Lung Tumor Segmentation in PET-CT Images
CVPR 2025
DejaVid: Encoder-Agnostic Learned Temporal Matching for Video Classification
CVPR 2025
CoSER: Towards Consistent Dense Multiview Text-to-Image Generator for 3D Creation
CVPR 2025
HybridMQA: Exploring Geometry-Texture Interactions for Colored Mesh Quality Assessment
CVPR 2025
StickMotion: Generating 3D Human Motions by Drawing a Stickman
CVPR 2025
Reversible Decoupling Network for Single Image Reflection Removal
CVPR 2025
<
1
…
104
105
106
…
523
>