conftrace_

Artificial Intelligence › Core AI ›

Multimodal Learning

13,057 papers

Papers per year

Papers

Multi-Modal Synergistic Implicit Image Enhancement for Efficient Optical Flow Estimation CVPR 2025

Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis CVPR 2025

Generating Multimodal Driving Scenes via Next-Scene Prediction CVPR 2025

You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale CVPR 2025

Locality-Aware Zero-Shot Human-Object Interaction Detection CVPR 2025

PEACE: Empowering Geologic Map Holistic Understanding with MLLMs CVPR 2025

ConceptGuard: Continual Personalized Text-to-Image Generation with Forgetting and Confusion Mitigation CVPR 2025

DocVLM: Make Your VLM an Efficient Reader CVPR 2025

Adaptive Unimodal Regulation for Balanced Multimodal Information Acquisition CVPR 2025

GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency CVPR 2025

Dynamic Derivation and Elimination: Audio Visual Segmentation with Enhanced Audio Semantics CVPR 2025

LoRASculpt: Sculpting LoRA for Harmonizing General and Specialized Knowledge in Multimodal Large Language Models CVPR 2025

SketchAgent: Language-Driven Sequential Sketch Generation CVPR 2025

DRAWER: Digital Reconstruction and Articulation With Environment Realism CVPR 2025

Empowering LLMs to Understand and Generate Complex Vector Graphics CVPR 2025

MLVU: Benchmarking Multi-task Long Video Understanding CVPR 2025

SmartCLIP: Modular Vision-language Alignment with Identification Guarantees CVPR 2025

ROD-MLLM: Towards More Reliable Object Detection in Multimodal Large Language Models CVPR 2025

RoboGround: Robotic Manipulation with Grounded Vision-Language Priors CVPR 2025

VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide CVPR 2025

SDGOCC: Semantic and Depth-Guided Bird's-Eye View Transformation for 3D Multimodal Occupancy Prediction CVPR 2025

VELOCITI: Benchmarking Video-Language Compositional Reasoning with Strict Entailment CVPR 2025

Pippo: High-Resolution Multi-View Humans from a Single Image CVPR 2025

MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders CVPR 2025

CamFreeDiff: Camera-free Image to Panorama Generation with Diffusion Model CVPR 2025