conftrace_

Artificial Intelligence › Core AI ›

Multimodal Learning

13,057 papers

Papers per year

Papers

Efficient Transfer Learning for Video-language Foundation Models CVPR 2025

ANNEXE: Unified Analyzing, Answering, and Pixel Grounding for Egocentric Interaction CVPR 2025

3D Dental Model Segmentation with Geometrical Boundary Preserving CVPR 2025

Neuro-3D: Towards 3D Visual Decoding from EEG Signals CVPR 2025

FastVLM: Efficient Vision Encoding for Vision Language Models CVPR 2025

GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation CVPR 2025

Do Computer Vision Foundation Models Learn the Low-level Characteristics of the Human Visual System? CVPR 2025

FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training CVPR 2025

OpenSDI: Spotting Diffusion-Generated Images in the Open World CVPR 2025

FLAIR: VLM with Fine-grained Language-informed Image Representations CVPR 2025

GG-SSMs: Graph-Generating State Space Models CVPR 2025

STDD: Spatio-Temporal Dual Diffusion for Video Generation CVPR 2025

Implicit Correspondence Learning for Image-to-Point Cloud Registration CVPR 2025

Holmes-VAU: Towards Long-term Video Anomaly Understanding at Any Granularity CVPR 2025

ReWind: Understanding Long Videos with Instructed Learnable Memory CVPR 2025

Scene4U: Hierarchical Layered 3D Scene Reconstruction from Single Panoramic Image for Your Immerse Exploration CVPR 2025

ACAttack: Adaptive Cross Attacking RGB-T Tracker via Multi-Modal Response Decoupling CVPR 2025

DeCafNet: Delegate and Conquer for Efficient Temporal Grounding in Long Videos CVPR 2025

Robust Audio-Visual Segmentation via Audio-Guided Visual Convergent Alignment CVPR 2025

LOCORE: Image Re-ranking with Long-Context Sequence Modeling CVPR 2025

NeRFPrior: Learning Neural Radiance Field as a Prior for Indoor Scene Reconstruction CVPR 2025

SceneTAP: Scene-Coherent Typographic Adversarial Planner against Vision-Language Models in Real-World Environments CVPR 2025

AerialMegaDepth: Learning Aerial-Ground Reconstruction and View Synthesis CVPR 2025

Towards Training-free Anomaly Detection with Vision and Language Foundation Models CVPR 2025

Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis CVPR 2025