← Learning Types

Deep Learning › Learning Types ›

Multi-Modal Learning

3194 directly classified papers

Papers per year

Papers

Exploiting Semantic Embedding and Visual Feature for Facial Action Unit Detection CVPR 2021

Multimodal Contrastive Training for Visual Representation Learning CVPR 2021

Structured Scene Memory for Vision-Language Navigation CVPR 2021

Bridge To Answer: Structure-Aware Graph Interaction Network for Video Question Answering CVPR 2021

Rich Context Aggregation With Reflection Prior for Glass Surface Detection CVPR 2021

Vx2Text: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs CVPR 2021

Boosting Video Representation Learning With Multi-Faceted Integration CVPR 2021

Repetitive Activity Counting by Sight and Sound CVPR 2021

TediGAN: Text-Guided Diverse Face Image Generation and Manipulation CVPR 2021

Thinking Fast and Slow: Efficient Text-to-Visual Retrieval With Transformers CVPR 2021

CoSMo: Content-Style Modulation for Image Retrieval With Text Feedback CVPR 2021

HOTR: End-to-End Human-Object Interaction Detection With Transformers CVPR 2021

POSEFusion: Pose-Guided Selective Fusion for Single-View Human Volumetric Capture CVPR 2021

Attention Bottlenecks for Multimodal Fusion NIPS 2021

Neural Dubber: Dubbing for Videos According to Scripts NIPS 2021

VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text NIPS 2021

AutoGEL: An Automated Graph Neural Network with Explicit Link Information NIPS 2021

Set Prediction in the Latent Space NIPS 2021

Learning from Inside: Self-driven Siamese Sampling and Reasoning for Video Question Answering NIPS 2021

Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model NIPS 2021

PolarStream: Streaming Object Detection and Segmentation with Polar Pillars NIPS 2021

UFC-BERT: Unifying Multi-Modal Controls for Conditional Image Synthesis NIPS 2021

Point-of-Interest Type Prediction using Text and Images EMNLP 2021

Finnish Dialect Identification: The Effect of Audio and Text EMNLP 2021

Looking for Confirmations: An Effective and Human-Like Visual Dialogue Strategy EMNLP 2021