← Learning Types

Deep Learning › Learning Types ›

Multi-Modal Learning

3194 directly classified papers

Papers per year

Papers

ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding EMNLP 2022

Cross-Modal Mutual Learning for Audio-Visual Speech Recognition and Manipulation AAAI 2022

L-CoDe:Language-Based Colorization Using Color-Object Decoupled Conditions AAAI 2022

Interact, Embed, and EnlargE: Boosting Modality-Specific Representations for Multi-Modal Person Re-identification AAAI 2022

DOC2PPT: Automatic Presentation Slides Generation from Scientific Documents AAAI 2022

VPAI_Lab at MedVidQA 2022: A Two-Stage Cross-modal Fusion Method for Medical Instructional Video Classification ACL 2022

Comprehensive Multi-Modal Interactions for Referring Image Segmentation ACL 2022

One Agent To Rule Them All: Towards Multi-agent Conversational AI ACL 2022

XFUND: A Benchmark Dataset for Multilingual Visually Rich Form Understanding ACL 2022

UNIMO-2: End-to-End Unified Vision-Language Grounded Learning ACL 2022

Interpreting Gender Bias in Neural Machine Translation: Multilingual Architecture Matters AAAI 2022

Self-Supervised Audio-and-Text Pre-training with Extremely Low-Resource Parallel Data AAAI 2022

Team IITP-AINLPML at WASSA 2022: Empathy Detection, Emotion Classification and Personality Detection ACL 2022

SSNCSE NLP@TamilNLP-ACL2022: Transformer based approach for detection of abusive comment for Tamil language ACL 2022

Understanding Attention for Vision-and-Language Tasks COLING 2022

On Guiding Visual Attention With Language Specification CVPR 2022

M5Product: Self-Harmonized Contrastive Learning for E-Commercial Multi-Modal Pretraining CVPR 2022

CAT-Det: Contrastively Augmented Transformer for Multi-Modal 3D Object Detection CVPR 2022

Breaking Down Multilingual Machine Translation ACL 2022

Assessing Multilingual Fairness in Pre-trained Multimodal Representations ACL 2022

Modality-specific Learning Rates for Effective Multimodal Additive Late-fusion ACL 2022

What do Models Learn From Training on More Than Text? Measuring Visual Commonsense Knowledge ACL 2022

Data Augmented 3D Semantic Scene Completion With 2D Segmentation Priors WACV 2022

Improving Single-Image Defocus Deblurring: How Dual-Pixel Images Help Through Multi-Task Learning WACV 2022

DG-Labeler and DGL-MOTS Dataset: Boost the Autonomous Driving Perception WACV 2022