cross-modal learning

521 papers

Explore in graph

Also known as

CMP C3HOST

Co-occurring keywords

multimodal learning (4622) contrastive learning (3979) knowledge distillation (3680) representation learning (6174) multi-modal learning (1276) vision-language model (2235) self-supervised learning (3751) domain adaptation (4578) video understanding (1647) zero-shot learning (3637)

Papers

CmEAA: Cross-modal Enhancement and Alignment Adapter for Radiology Report Generation COLING 2025

Text2Vis: A Challenging and Diverse Benchmark for Generating Multimodal Visualizations from Text EMNLP 2025

Uncertainty-Aware Cross-Modal Alignment for Hate Speech Detection COLING 2024

TARN-VIST: Topic Aware Reinforcement Network for Visual Storytelling COLING 2024

Monocular 3D Object Detection With LiDAR Guided Semi Supervised Active Learning WACV 2024

Transforming LLMs into Cross-modal and Cross-lingual Retrieval Systems ACL 2024

Continual Audio-Visual Sound Separation NIPS 2024

Tell Me What Is Good about This Property: Leveraging Reviews for Segment-Personalized Image Collection Summarization AAAI 2024

Video Event Extraction with Multi-View Interaction Knowledge Distillation AAAI 2024

AraCLIP: Cross-Lingual Learning for Effective Arabic Image Retrieval ACL 2024

Exploiting Auxiliary Caption for Video Grounding AAAI 2024

Text-Guided Face Recognition Using Multi-Granularity Cross-Modal Contrastive Learning WACV 2024

Complex Organ Mask Guided Radiology Report Generation WACV 2024

Visual Hallucination Elevates Speech Recognition AAAI 2024

Exploring Vision Transformers for 3D Human Motion-Language Models with Motion Patches CVPR 2024

XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation NIPS 2024

Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval AAAI 2024

Noise-Aware Image Captioning with Progressively Exploring Mismatched Words AAAI 2024

Towards Multi-modal Sarcasm Detection via Disentangled Multi-grained Multi-modal Distilling COLING 2024

ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training ACL 2024

DistilVPR: Cross-Modal Knowledge Distillation for Visual Place Recognition AAAI 2024

HALSIE: Hybrid Approach to Learning Segmentation by Simultaneously Exploiting Image and Event Modalities WACV 2024

HalluciDet: Hallucinating RGB Modality for Person Detection Through Privileged Information WACV 2024

Hierarchical Aligned Multimodal Learning for NER on Tweet Posts AAAI 2024

THInImg: Cross-Modal Steganography for Presenting Talking Heads in Images WACV 2024