← Learning Types

Machine Learning › Learning Types ›

Multi-Modal Learning

1213 directly classified papers

Papers per year

Papers

Reject Decoding via Language-Vision Models for Text-to-Image Synthesis AAAI 2023

Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images ICCV 2023

Action-Conditioned Generation of Bimanual Object Manipulation Sequences AAAI 2023

MIANet: Aggregating Unbiased Instance and General Information for Few-Shot Semantic Segmentation CVPR 2023

Stimulus Verification Is a Universal and Effective Sampler in Multi-Modal Human Trajectory Prediction CVPR 2023

Learning To Detect and Segment for Open Vocabulary Object Detection CVPR 2023

Does Listener Gaze in Face-to-Face Interaction Follow the Entropy Rate Constancy Principle: An Empirical Study EMNLP 2023

PronScribe: Highly Accurate Multimodal Phonemic Transcription From Speech and Text INTERSPEECH 2023

Improving Bilingual TTS Using Language And Phonology Embedding With Embedding Strength Modulator INTERSPEECH 2023

Exploring the Impact of Back-End Network on Wav2vec 2.0 for Dialect Identification INTERSPEECH 2023

Rethinking the Visual Cues in Audio-Visual Speaker Extraction INTERSPEECH 2023

Multi-channel separation of dynamic speech and sound events INTERSPEECH 2023

Directional Speech Recognition for Speaker Disambiguation and Cross-talk Suppression INTERSPEECH 2023

GC-Hunter at ImageArg Shared Task: Multi-Modal Stance and Persuasiveness Learning EMNLP 2023

Argumentative Stance Prediction: An Exploratory Study on Multimodality and Few-Shot Learning EMNLP 2023

BiCro: Noisy Correspondence Rectification for Multi-Modality Data via Bi-Directional Cross-Modal Similarity Consistency CVPR 2023

MSF: Motion-Guided Sequential Fusion for Efficient 3D Object Detection From Point Cloud Sequences CVPR 2023

Active Exploration of Multimodal Complementarity for Few-Shot Action Recognition CVPR 2023

RONO: Robust Discriminative Learning With Noisy Labels for 2D-3D Cross-Modal Retrieval CVPR 2023

MMG-Ego4D: Multimodal Generalization in Egocentric Action Recognition CVPR 2023

Open-Vocabulary Point-Cloud Object Detection Without 3D Annotation CVPR 2023

OSAN: A One-Stage Alignment Network To Unify Multimodal Alignment and Unsupervised Domain Adaptation CVPR 2023

Depth Estimation From Camera Image and mmWave Radar Point Cloud CVPR 2023

BeAts: Bengali Speech Acts Recognition using Multimodal Attention Fusion INTERSPEECH 2023

MMST-ViT: Climate Change-aware Crop Yield Prediction via Multi-Modal Spatial-Temporal Vision Transformer ICCV 2023