conftrace_

Artificial Intelligence › Core AI ›

Multimodal Learning

13,057 papers

Papers per year

Papers

InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation AAAI 2025

DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification AAAI 2025

MambaPro: Multi-Modal Object Re-identification with Mamba Aggregation and Synergistic Prompt AAAI 2025

IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word Emphasis AAAI 2025

CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation AAAI 2025

VA-AR: Learning Velocity-Aware Action Representations with Mixture of Window Attention AAAI 2025

GraphAvatar: Compact Head Avatars with GNN-Generated 3D Gaussians AAAI 2025

ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation AAAI 2025

VarCMP: Adapting Cross-Modal Pre-Training Models for Video Anomaly Retrieval AAAI 2025

Combating Multimodal LLM Hallucination via Bottom-Up Holistic Reasoning AAAI 2025

Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark AAAI 2025

FriendsQA: A New Large-Scale Deep Video Understanding Dataset with Fine-grained Topic Categorization for Story Videos AAAI 2025

Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video Retrieval AAAI 2025

Cross-modulated Attention Transformer for RGBT Tracking AAAI 2025

TextRefiner: Internal Visual Feature as Efficient Refiner for Vision-Language Models Prompt Tuning AAAI 2025

Expand VSR Benchmark for VLLM to Expertize in Spatial Rules AAAI 2025

Few-Shot Incremental Learning via Foreground Aggregation and Knowledge Transfer for Audio-Visual Semantic Segmentation AAAI 2025

Attention-Driven GUI Grounding: Leveraging Pretrained Multimodal Large Language Models Without Fine-Tuning AAAI 2025

CLIP-driven View-aware Prompt Learning for Unsupervised Vehicle Re-identification AAAI 2025

PDDM: Pseudo Depth Diffusion Model for RGB-PD Semantic Segmentation Based in Complex Indoor Scenes AAAI 2025

Zero-shot Video Moment Retrieval via Off-the-shelf Multimodal Large Language Models AAAI 2025

HOIMamba: Efficient Mamba-based Disentangled Progressive Learning for HOI Detection AAAI 2025

FLAME: Learning to Navigate with Multimodal LLM in Urban Environments AAAI 2025

FATE: Feature-Adapted Parameter Tuning for Vision-Language Models AAAI 2025

RetouchGPT: LLM-based Interactive High-Fidelity Face Retouching via Imperfection Prompting AAAI 2025