cross-modal learning

521 papers

Explore in graph

Also known as

CMP C3HOST

Co-occurring keywords

multimodal learning (4622) contrastive learning (3979) knowledge distillation (3680) representation learning (6174) multi-modal learning (1276) vision-language model (2235) self-supervised learning (3751) domain adaptation (4578) video understanding (1647) zero-shot learning (3637)

Papers

Inverse Compositional Learning for Weakly-supervised Relation Grounding ICCV 2023

Event Camera Data Pre-training ICCV 2023

Verbs in Action: Improving Verb Understanding in Video-Language Models ICCV 2023

MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition ICCV 2023

Temporal Collection and Distribution for Referring Video Object Segmentation ICCV 2023

HairCLIPv2: Unifying Hair Editing via Proxy Feature Blending ICCV 2023

DiffCloth: Diffusion Based Garment Synthesis and Manipulation via Structural Cross-modal Semantic Alignment ICCV 2023

Turbo your multi-modal classification with contrastive learning INTERSPEECH 2023

PromptStyle: Controllable Style Transfer for Text-to-Speech with Natural Language Descriptions INTERSPEECH 2023

Enhance Temporal Relations in Audio Captioning with Sound Event Detection INTERSPEECH 2023

Multi-Scale Attention for Audio Question Answering INTERSPEECH 2023

Image-driven Audio-visual Universal Source Separation INTERSPEECH 2023

Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation INTERSPEECH 2023

DUET: Cross-Modal Semantic Grounding for Contrastive Zero-Shot Learning AAAI 2023

Mx2M: Masked Cross-Modality Modeling in Domain Adaptation for 3D Semantic Segmentation AAAI 2023

Accommodating Audio Modality in CLIP for Multimodal Processing AAAI 2023

Global-Local Characteristic Excited Cross-Modal Attacks from Images to Videos AAAI 2023

CoMAE: Single Model Hybrid Pre-training on Small-Scale RGB-D Datasets AAAI 2023

Unifying Cross-Lingual and Cross-Modal Modeling Towards Weakly Supervised Multilingual Vision-Language Pre-training ACL 2023

TAVT: Towards Transferable Audio-Visual Text Generation ACL 2023

CocaCLIP: Exploring Distillation of Fully-Connected Knowledge Interaction Graph for Lightweight Text-Image Retrieval ACL 2023

Deeply Coupled Cross-Modal Prompt Learning ACL 2023

Vision Language Pre-training by Contrastive Learning with Cross-Modal Similarity Regulation ACL 2023

CKDST: Comprehensively and Effectively Distill Knowledge from Machine Translation to End-to-End Speech Translation ACL 2023

CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-Training ACL 2023