conftrace_

Artificial Intelligence › Core AI ›

Multimodal Learning

13,057 papers

Papers per year

Papers

EVPGS: Enhanced View Prior Guidance for Splatting-based Extrapolated View Synthesis CVPR 2025

GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding CVPR 2025

A3: Few-shot Prompt Learning of Unlearnable Examples with Cross-Modal Adversarial Feature Alignment CVPR 2025

MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal Representations CVPR 2025

UWAV: Uncertainty-weighted Weakly-supervised Audio-Visual Video Parsing CVPR 2025

Mosaic of Modalities: A Comprehensive Benchmark for Multimodal Graph Learning CVPR 2025

FirePlace: Geometric Refinements of LLM Common Sense Reasoning for 3D Object Placement CVPR 2025

ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding CVPR 2025

Birth and Death of a Rose CVPR 2025

SoundVista: Novel-View Ambient Sound Synthesis via Visual-Acoustic Binding CVPR 2025

CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object Representation CVPR 2025

MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation CVPR 2025

Bridging Gait Recognition and Large Language Models Sequence Modeling CVPR 2025

Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought CVPR 2025

COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models under Distribution Shifts CVPR 2025

Retrieving Semantics from the Deep: an RAG Solution for Gesture Synthesis CVPR 2025

Towards a Universal Synthetic Video Detector: From Face or Background Manipulations to Fully AI-Generated Content CVPR 2025

TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation CVPR 2025

Hierarchical Knowledge Prompt Tuning for Multi-task Test-Time Adaptation CVPR 2025

Cross-Modal Interactive Perception Network with Mamba for Lung Tumor Segmentation in PET-CT Images CVPR 2025

DejaVid: Encoder-Agnostic Learned Temporal Matching for Video Classification CVPR 2025

CoSER: Towards Consistent Dense Multiview Text-to-Image Generator for 3D Creation CVPR 2025

HybridMQA: Exploring Geometry-Texture Interactions for Colored Mesh Quality Assessment CVPR 2025

StickMotion: Generating 3D Human Motions by Drawing a Stickman CVPR 2025

Reversible Decoupling Network for Single Image Reflection Removal CVPR 2025