conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13,057 papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
Multi-Modal Synergistic Implicit Image Enhancement for Efficient Optical Flow Estimation
CVPR 2025
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis
CVPR 2025
Generating Multimodal Driving Scenes via Next-Scene Prediction
CVPR 2025
You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale
CVPR 2025
Locality-Aware Zero-Shot Human-Object Interaction Detection
CVPR 2025
PEACE: Empowering Geologic Map Holistic Understanding with MLLMs
CVPR 2025
ConceptGuard: Continual Personalized Text-to-Image Generation with Forgetting and Confusion Mitigation
CVPR 2025
DocVLM: Make Your VLM an Efficient Reader
CVPR 2025
Adaptive Unimodal Regulation for Balanced Multimodal Information Acquisition
CVPR 2025
GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency
CVPR 2025
Dynamic Derivation and Elimination: Audio Visual Segmentation with Enhanced Audio Semantics
CVPR 2025
LoRASculpt: Sculpting LoRA for Harmonizing General and Specialized Knowledge in Multimodal Large Language Models
CVPR 2025
SketchAgent: Language-Driven Sequential Sketch Generation
CVPR 2025
DRAWER: Digital Reconstruction and Articulation With Environment Realism
CVPR 2025
Empowering LLMs to Understand and Generate Complex Vector Graphics
CVPR 2025
MLVU: Benchmarking Multi-task Long Video Understanding
CVPR 2025
SmartCLIP: Modular Vision-language Alignment with Identification Guarantees
CVPR 2025
ROD-MLLM: Towards More Reliable Object Detection in Multimodal Large Language Models
CVPR 2025
RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
CVPR 2025
VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide
CVPR 2025
SDGOCC: Semantic and Depth-Guided Bird's-Eye View Transformation for 3D Multimodal Occupancy Prediction
CVPR 2025
VELOCITI: Benchmarking Video-Language Compositional Reasoning with Strict Entailment
CVPR 2025
Pippo: High-Resolution Multi-View Humans from a Single Image
CVPR 2025
MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders
CVPR 2025
CamFreeDiff: Camera-free Image to Panorama Generation with Diffusion Model
CVPR 2025
<
1
…
101
102
103
…
523
>