conftrace_

Artificial Intelligence › Core AI ›

Multimodal Learning

13,057 papers

Papers per year

Papers

Detector-Free Weakly Supervised Grounding by Separation ICCV 2021

CrossCLR: Cross-Modal Contrastive Learning for Multi-Modal Video Representations ICCV 2021

TransView: Inside, Outside, and Across the Cropping View Boundaries ICCV 2021

TRAR: Routing the Attention Spans in Transformer for Visual Question Answering ICCV 2021

How To Design a Three-Stage Architecture for Audio-Visual Active Speaker Detection in the Wild ICCV 2021

Just Ask: Learning To Answer Questions From Millions of Narrated Videos ICCV 2021

UniT: Multimodal Multitask Learning With a Unified Transformer ICCV 2021

Compressing Visual-Linguistic Model via Knowledge Distillation ICCV 2021

Telling the What While Pointing to the Where: Multimodal Queries for Image Retrieval ICCV 2021

Motion Guided Region Message Passing for Video Captioning ICCV 2021

AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis ICCV 2021

TACo: Token-Aware Cascade Contrastive Learning for Video-Text Alignment ICCV 2021

Zero-Shot Natural Language Video Localization ICCV 2021

MDETR - Modulated Detection for End-to-End Multi-Modal Understanding ICCV 2021

STVGBert: A Visual-Linguistic Transformer Based Framework for Spatio-Temporal Video Grounding ICCV 2021

Explain Me the Painting: Multi-Topic Knowledgeable Art Description Generation ICCV 2021

LapsCore: Language-Guided Person Search via Color Reasoning ICCV 2021

Multimodal Clustering Networks for Self-Supervised Learning From Unlabeled Videos ICCV 2021

Multi-Modality Associative Bridging Through Memory: Speech Sound Recollected From Face Video ICCV 2021

Move2Hear: Active Audio-Visual Source Separation ICCV 2021

Three Steps to Multimodal Trajectory Prediction: Modality Clustering, Classification and Synthesis ICCV 2021

Consistency-Aware Graph Network for Human Interaction Understanding ICCV 2021

Benchmark Platform for Ultra-Fine-Grained Visual Categorization Beyond Human Performance ICCV 2021

GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-Efficient Medical Image Recognition ICCV 2021

Summarize and Search: Learning Consensus-Aware Dynamic Convolution for Co-Saliency Detection ICCV 2021