conftrace_

Artificial Intelligence › Core AI ›

Multimodal Learning

13,057 papers

Papers per year

Papers

DreamAlign: Dynamic Text-to-3D Optimization with Human Preference Alignment AAAI 2025

Union Is Strength! Unite the Power of LLMs and MLLMs for Chart Question Answering AAAI 2025

LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation AAAI 2025

Relation-aware Hierarchical Prompt for Open-vocabulary Scene Graph Generation AAAI 2025

HC-LLM: Historical-Constrained Large Language Models for Radiology Report Generation AAAI 2025

Unveiling the Knowledge of CLIP for Training-Free Open-Vocabulary Semantic Segmentation AAAI 2025

DoGA: Enhancing Grounded Object Detection via Grouped Pre-Training with Attributes AAAI 2025

Learning Dynamic Similarity by Bidirectional Hierarchical Sliding Semantic Probe for Efficient Text Video Retrieval AAAI 2025

Asymmetric Visual Semantic Embedding Framework for Efficient Vision-Language Alignment AAAI 2025

CLIP-PCQA: Exploring Subjective-Aligned Vision-Language Modeling for Point Cloud Quality Assessment AAAI 2025

Towards Robust Visual Question Answering via Prompt-Driven Geometric Harmonization AAAI 2025

See Through Their Minds: Learning Transferable Brain Decoding Models from Cross-Subject fMRI AAAI 2025

SCOPE: Sign Language Contextual Processing with Embedding from LLMs AAAI 2025

Advancing Comprehensive Aesthetic Insight with Multi-Scale Text-Guided Self-Supervised Learning AAAI 2025

RGBT Tracking via All-layer Multimodal Interactions with Progressive Fusion Mamba AAAI 2025

Generative Video Diffusion for Unseen Novel Semantic Video Moment Retrieval AAAI 2025

Revisiting Change Captioning from Self-supervised Global-Part Alignment AAAI 2025

Arbitrary Reading Order Scene Text Spotter with Local Semantics Guidance AAAI 2025

GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models AAAI 2025

Aligning and Prompting Anything for Zero-Shot Generalized Anomaly Detection AAAI 2025

Does VLM Classification Benefit from LLM Description Semantics? AAAI 2025

Image Regeneration: Evaluating Text-to-Image Model via Generating Identical Image with Multimodal Large Language Models AAAI 2025

Black-Box Test-Time Prompt Tuning for Vision-Language Models AAAI 2025

EvdCLIP: Improving Vision-Language Retrieval with Entity Visual Descriptions from Large Language Models AAAI 2025

Extract Free Dense Misalignment from CLIP AAAI 2025