conftrace
_
Papers
Trends
Conferences
Explore
More
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Keywords
multimodal learning
4645 papers
Explore in graph
Co-occurring keywords
large language model
(13587)
vision-language model
(2348)
visual question answering
(1017)
video understanding
(1658)
multi-modal learning
(1278)
contrastive learning
(4032)
representation learning
(6206)
transfer learning
(5449)
zero-shot learning
(3650)
vision language model
(767)
Papers
SemEval-2025 Task 1: AdMIRe - Advancing Multimodal Idiomaticity Representation
ACL 2025
OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction
ACL 2025
Through the Looking Glass: Common Sense Consistency Evaluation of Weird Images
NAACL 2025
EgoCast: Forecasting Egocentric Human Pose in the Wild
WACV 2025
Learning Visual Grounding from Generative Vision and Language Model
WACV 2025
Now You See Me: Context-Aware Automatic Audio Description
WACV 2025
Focusing on What to Decode and What to Train: SOV Decoding with Specific Target Guided DeNoising and Vision Language Advisor
WACV 2025
FOR: Finetuning for Object Level Open Vocabulary Image Retrieval
WACV 2025
Information Extraction from Heterogeneous Documents without Ground Truth Labels using Synthetic Label Generation and Knowledge Distillation
WACV 2025
VMAs: Video-to-Music Generation via Semantic Alignment in Web Music Videos
WACV 2025
Vision-Aware Text Features in Referring Image Segmentation: From Object Understanding to Context Understanding
WACV 2025
Make VLM Recognize Visual Hallucination on Cartoon Character Image with Pose Information
WACV 2025
Cross-Aligned Fusion for Multimodal Understanding
WACV 2025
Mixed Patch Visible-Infrared Modality Agnostic Object Detection
WACV 2025
Semantically Conditioned Prompts for Visual Recognition under Missing Modality Scenarios
WACV 2025
Multimodal Interpretable Depression Analysis using Visual Physiological Audio and Textual Data
WACV 2025
Towards Real-Time Open-Vocabulary Video Instance Segmentation
WACV 2025
DMPT: Decoupled Modality-Aware Prompt Tuning for Multi-Modal Object Re-Identification
WACV 2025
CM3T: Framework for Efficient Multimodal Learning for Inhomogeneous Interaction Datasets
WACV 2025
Paladin: Understanding Video Intentions in Political Advertisement Videos
WACV 2025
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
NAACL 2025
Efficient Prompting for Continual Adaptation to Missing Modalities
NAACL 2025
CAMEL-Bench: A Comprehensive Arabic LMM Benchmark
NAACL 2025
CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing Games
ICCV 2025
SPECS: Specificity-Enhanced CLIP-Score for Long Image Caption Evaluation
EMNLP 2025
<
1
…
36
37
38
…
186
>