cross-modal learning

521 papers

Explore in graph

Also known as

CMP C3HOST

Co-occurring keywords

multimodal learning (4622) contrastive learning (3979) knowledge distillation (3680) representation learning (6174) multi-modal learning (1276) vision-language model (2235) self-supervised learning (3751) domain adaptation (4578) video understanding (1647) zero-shot learning (3637)

Papers

THInImg: Cross-Modal Steganography for Presenting Talking Heads in Images WACV 2024

UniAudio 1.5: Large Language Model-Driven Audio Codec is A Few-Shot Audio Task Learner NIPS 2024

End-to-End RGB-D Image Compression via Exploiting Channel-Modality Redundancy AAAI 2024

CRKD: Enhanced Camera-Radar Object Detection with Cross-modality Knowledge Distillation CVPR 2024

Vision-and-Language Navigation via Causal Learning CVPR 2024

Doodle Your 3D: From Abstract Freehand Sketches to Precise 3D Shapes CVPR 2024

See Detail Say Clear: Towards Brain CT Report Generation via Pathological Clue-driven Representation Learning EMNLP 2024

CrossMAE: Cross-Modality Masked Autoencoders for Region-Aware Audio-Visual Pre-Training CVPR 2024

AraCLIP: Cross-Lingual Learning for Effective Arabic Image Retrieval ACL 2024

FVTTS : Face Based Voice Synthesis for Text-to-Speech INTERSPEECH 2024

Enhancing Image-to-Text Generation in Radiology Reports through Cross-modal Multi-Task Learning COLING 2024

ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training ACL 2024

VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models EMNLP 2024

UFineBench: Towards Text-based Person Retrieval with Ultra-fine Granularity CVPR 2024

Implicit Discriminative Knowledge Learning for Visible-Infrared Person Re-Identification CVPR 2024

XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation NIPS 2024

Leveraging Cross-Modal Neighbor Representation for Improved CLIP Classification CVPR 2024

Everyday Object Meets Vision-and-Language Navigation Agent via Backdoor NIPS 2024

Transforming LLMs into Cross-modal and Cross-lingual Retrieval Systems ACL 2024

Weakly Misalignment-free Adaptive Feature Alignment for UAVs-based Multimodal Object Detection CVPR 2024

Alleviating Foreground Sparsity for Semi-Supervised Monocular 3D Object Detection WACV 2024

Hierarchical Aligned Multimodal Learning for NER on Tweet Posts AAAI 2024

Language-aware Visual Semantic Distillation for Video Question Answering CVPR 2024

DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval AAAI 2024

Asymmetric Mutual Alignment for Unsupervised Zero-Shot Sketch-Based Image Retrieval AAAI 2024