← Learning Types

Machine Learning › Learning Types ›

Multi-Modal Learning

1213 directly classified papers

Papers per year

Papers

Reliable Conflictive Multi-View Learning AAAI 2024

LAFA: Multimodal Knowledge Graph Completion with Link Aware Fusion and Aggregation AAAI 2024

Diff2Lip: Audio Conditioned Diffusion Models for Lip-Synchronization WACV 2024

SURER: Structure-Adaptive Unified Graph Neural Network for Multi-View Clustering AAAI 2024

Improving Subject-Driven Image Synthesis with Subject-Agnostic Guidance CVPR 2024

Little Red Riding Hood Goes around the Globe: Crosslingual Story Planning and Generation with Large Language Models COLING 2024

Knowledge-Guided Cross-Topic Visual Question Generation COLING 2024

Improving Personalized Sentiment Representation with Knowledge-enhanced and Parameter-efficient Layer Normalization COLING 2024

LGMRec: Local and Global Graph Learning for Multimodal Recommendation AAAI 2024

Enhancing Multi-View Pedestrian Detection Through Generalized 3D Feature Pulling WACV 2024

ActionIE: Action Extraction from Scientific Literature with Programming Languages ACL 2024

Language-driven Grasp Detection CVPR 2024

Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs CVPR 2024

Novel Class Discovery in Chest X-rays via Paired Images and Text AAAI 2024

Unveiling Implicit Deceptive Patterns in Multi-Modal Fake News via Neuro-Symbolic Reasoning AAAI 2024

Cross-spectral Gated-RGB Stereo Depth Estimation CVPR 2024

Language-aware Visual Semantic Distillation for Video Question Answering CVPR 2024

Towards Surveillance Video-and-Language Understanding: New Dataset Baselines and Challenges CVPR 2024

BodyMAP - Jointly Predicting Body Mesh and 3D Applied Pressure Map for People in Bed CVPR 2024

Semantic Fusion Augmentation and Semantic Boundary Detection: A Novel Approach to Multi-Target Video Moment Retrieval WACV 2024

Controllable Text-to-Image Synthesis for Multi-Modality MR Images WACV 2024

Ranking Distillation for Open-Ended Video Question Answering with Insufficient Labels CVPR 2024

FELGA: Unsupervised Fragment Embedding for Fine-Grained Cross-Modal Association WACV 2024

BirdSAT: Cross-View Contrastive Masked Autoencoders for Bird Species Classification and Mapping WACV 2024

RELI11D: A Comprehensive Multimodal Human Motion Dataset and Method CVPR 2024