conftrace_

Artificial Intelligence › Core AI ›

Multimodal Learning

13,057 papers

Papers per year

Papers

A Comprehensive Study of Decoder-Only LLMs for Text-to-Image Generation CVPR 2025

Deformable Radial Kernel Splatting CVPR 2025

Post-pre-training for Modality Alignment in Vision-Language Foundation Models CVPR 2025

Pseudo Visible Feature Fine-Grained Fusion for Thermal Object Detection CVPR 2025

DiffPortrait360: Consistent Portrait Diffusion for 360 View Synthesis CVPR 2025

NVILA: Efficient Frontier Visual Language Models CVPR 2025

Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image Synthesis CVPR 2025

RAP: Retrieval-Augmented Personalization for Multimodal Large Language Models CVPR 2025

Incorporating Dense Knowledge Alignment into Unified Multimodal Representation Models CVPR 2025

LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models CVPR 2025

Learning to Highlight Audio by Watching Movies CVPR 2025

WeGen: A Unified Model for Interactive Multimodal Generation as We Chat CVPR 2025

HRAvatar: High-Quality and Relightable Gaussian Head Avatar CVPR 2025

MagicQuill: An Intelligent Interactive Image Editing System CVPR 2025

Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces CVPR 2025

ProxyTransformation: Preshaping Point Cloud Manifold With Proxy Attention For 3D Visual Grounding CVPR 2025

Beyond Words: Augmenting Discriminative Richness via Diffusions in Unsupervised Prompt Learning CVPR 2025

Visual and Semantic Prompt Collaboration for Generalized Zero-Shot Learning CVPR 2025

Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreaks CVPR 2025

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling CVPR 2025

Human-centered Interactive Learning via MLLMs for Text-to-Image Person Re-identification CVPR 2025

KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame Interpolation CVPR 2025

Towards Natural Language-Based Document Image Retrieval: New Dataset and Benchmark CVPR 2025

Exposure-slot: Exposure-centric Representations Learning with Slot-in-Slot Attention for Region-aware Exposure Correction CVPR 2025

GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control CVPR 2025