conftrace_

multimodal learning

4645 papers

Explore in graph

Co-occurring keywords

large language model (13587) vision-language model (2348) visual question answering (1017) video understanding (1658) multi-modal learning (1278) contrastive learning (4032) representation learning (6206) transfer learning (5449) zero-shot learning (3650) vision language model (767)

Papers

SemEval-2025 Task 1: AdMIRe - Advancing Multimodal Idiomaticity Representation ACL 2025

OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction ACL 2025

Through the Looking Glass: Common Sense Consistency Evaluation of Weird Images NAACL 2025

EgoCast: Forecasting Egocentric Human Pose in the Wild WACV 2025

Learning Visual Grounding from Generative Vision and Language Model WACV 2025

Now You See Me: Context-Aware Automatic Audio Description WACV 2025

Focusing on What to Decode and What to Train: SOV Decoding with Specific Target Guided DeNoising and Vision Language Advisor WACV 2025

FOR: Finetuning for Object Level Open Vocabulary Image Retrieval WACV 2025

Information Extraction from Heterogeneous Documents without Ground Truth Labels using Synthetic Label Generation and Knowledge Distillation WACV 2025

VMAs: Video-to-Music Generation via Semantic Alignment in Web Music Videos WACV 2025

Vision-Aware Text Features in Referring Image Segmentation: From Object Understanding to Context Understanding WACV 2025

Make VLM Recognize Visual Hallucination on Cartoon Character Image with Pose Information WACV 2025

Cross-Aligned Fusion for Multimodal Understanding WACV 2025

Mixed Patch Visible-Infrared Modality Agnostic Object Detection WACV 2025

Semantically Conditioned Prompts for Visual Recognition under Missing Modality Scenarios WACV 2025

Multimodal Interpretable Depression Analysis using Visual Physiological Audio and Textual Data WACV 2025

Towards Real-Time Open-Vocabulary Video Instance Segmentation WACV 2025

DMPT: Decoupled Modality-Aware Prompt Tuning for Multi-Modal Object Re-Identification WACV 2025

CM3T: Framework for Efficient Multimodal Learning for Inhomogeneous Interaction Datasets WACV 2025

Paladin: Understanding Video Intentions in Political Advertisement Videos WACV 2025

LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models NAACL 2025

Efficient Prompting for Continual Adaptation to Missing Modalities NAACL 2025

CAMEL-Bench: A Comprehensive Arabic LMM Benchmark NAACL 2025

CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing Games ICCV 2025

SPECS: Specificity-Enhanced CLIP-Score for Long Image Caption Evaluation EMNLP 2025