conftrace_

multimodal learning

4645 papers

Explore in graph

Co-occurring keywords

large language model (13587) vision-language model (2348) visual question answering (1017) video understanding (1658) multi-modal learning (1278) contrastive learning (4032) representation learning (6206) transfer learning (5449) zero-shot learning (3650) vision language model (767)

Papers

FashionVLP: Vision Language Transformer for Fashion Retrieval With Feedback CVPR 2022

Revisiting the "Video" in Video-Language Understanding CVPR 2022

Touch and Go: Learning from Human-Collected Vision and Touch NIPS 2022

Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts NIPS 2022

CAESAR: An Embodied Simulator for Generating Multimodal Referring Expression Datasets NIPS 2022

Flamingo: a Visual Language Model for Few-Shot Learning NIPS 2022

TANGO: Text-driven Photorealistic and Robust 3D Stylization via Lighting Decomposition NIPS 2022

Semi-Supervised Video Paragraph Grounding With Contrastive Encoder CVPR 2022

RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining ACL 2022

Multimodal Dialogue Response Generation ACL 2022

Contrastive Visual Semantic Pretraining Magnifies the Semantics of Natural Language Representations ACL 2022

Image Retrieval from Contextual Descriptions ACL 2022

M3ED: Multi-modal Multi-scene Multi-label Emotional Dialogue Database ACL 2022

End-to-End Modeling via Information Tree for One-Shot Natural Language Spatial Video Grounding ACL 2022

M-SENA: An Integrated Platform for Multimodal Sentiment Analysis ACL 2022

xGQA: Cross-Lingual Visual Question Answering ACL 2022

DU-VLG: Unifying Vision-and-Language Generation via Dual Sequence-to-Sequence Pre-training ACL 2022

Assessing Multilingual Fairness in Pre-trained Multimodal Representations ACL 2022

UNIMO-2: End-to-End Unified Vision-Language Grounded Learning ACL 2022

VPAI_Lab at MedVidQA 2022: A Two-Stage Cross-modal Fusion Method for Medical Instructional Video Classification ACL 2022

Less Descriptive yet Discriminative: Quantifying the Properties of Multimodal Referring Utterances via CLIP ACL 2022

Combining Language Models and Linguistic Information to Label Entities in Memes ACL 2022

Detecting the Role of an Entity in Harmful Memes: Techniques and their Limitations ACL 2022

Early Diagnosis of Lyme Disease by Recognizing Erythema Migrans Skin Lesion from Images Utilizing Deep Learning Techniques IJCAI 2022

Learnable Irrelevant Modality Dropout for Multimodal Action Recognition on Modality-Specific Annotated Videos CVPR 2022