cross-modal learning

521 papers

Explore in graph

Also known as

CMP C3HOST

Co-occurring keywords

multimodal learning (4622) contrastive learning (3979) knowledge distillation (3680) representation learning (6174) multi-modal learning (1276) vision-language model (2235) self-supervised learning (3751) domain adaptation (4578) video understanding (1647) zero-shot learning (3637)

Papers

Cross-Modal Attribute Insertions for Assessing the Robustness of Vision-and-Language Learning ACL 2023

Exploiting Pseudo Image Captions for Multimodal Summarization ACL 2023

CIF-PT: Bridging Speech and Text Representations for Spoken Language Understanding via Continuous Integrate-and-Fire Pre-Training ACL 2023

Cross-Modal Conceptualization in Bottleneck Models EMNLP 2023

Rethinking and Improving Multi-task Learning for End-to-end Speech Translation EMNLP 2023

Unsupervised Sounding Pixel Learning EMNLP 2023

Zero-Shot Referring Image Segmentation With Global-Local Context Features CVPR 2023

Learning Geometric-Aware Properties in 2D Representation Using Lightweight CAD Models, or Zero Real 3D Pairs CVPR 2023

Few-Shot Learning With Visual Distribution Calibration and Cross-Modal Distribution Alignment CVPR 2023

Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning With Multimodal Models CVPR 2023

TOPLight: Lightweight Neural Networks With Task-Oriented Pretraining for Visible-Infrared Recognition CVPR 2023

Video-Text As Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning CVPR 2023

Leveraging per Image-Token Consistency for Vision-Language Pre-Training CVPR 2023

Bidirectional Cross-Modal Knowledge Exploration for Video Recognition With Pre-Trained Vision-Language Models CVPR 2023

CAPro: Webly Supervised Learning with Cross-modality Aligned Prototypes NIPS 2023

Cross-modal Active Complementary Learning with Self-refining Correspondence NIPS 2023

Cross-Modal Label Contrastive Learning for Unsupervised Audio-Visual Event Localization AAAI 2023

Target-Free Text-Guided Image Manipulation AAAI 2023

CLIP-ReID: Exploiting Vision-Language Model for Image Re-identification without Concrete Text Labels AAAI 2023

Tree-Structured Trajectory Encoding for Vision-and-Language Navigation AAAI 2023

Video-Audio Domain Generalization via Confounder Disentanglement AAAI 2023

PEIT: Bridging the Modality Gap with Pre-trained Models for End-to-End Image Translation ACL 2023

Combo of Thinking and Observing for Outside-Knowledge VQA ACL 2023

MS-DETR: Natural Language Video Localization with Sampling Moment-Moment Interaction ACL 2023

CMOT: Cross-modal Mixup via Optimal Transport for Speech Translation ACL 2023