conftrace_

Artificial Intelligence › Core AI ›

Multimodal Learning

13,057 papers

Papers per year

Papers

Controlling Prosody in End-to-End TTS: A Case Study on Contrastive Focus Generation EMNLP 2021

Coreference by Appearance: Visually Grounded Event Coreference Resolution EMNLP 2021

Towards a Methodology Supporting Semiautomatic Annotation of HeadMovements in Video-recorded Conversations EMNLP 2021

VisualSem: a high-quality knowledge graph for vision and language EMNLP 2021

Template-aware Attention Model for Earnings Call Report Generation EMNLP 2021

Capturing Logical Structure of Visually Structured Documents with Multimodal Transition Parser EMNLP 2021

Named Entity Recognition in Historic Legal Text: A Transformer and State Machine Ensemble Method EMNLP 2021

Can images help recognize entities? A study of the role of images for Multimodal NER EMNLP 2021

Specificity-Preserving RGB-D Saliency Detection ICCV 2021

Composable Augmentation Encoding for Video Representation Learning ICCV 2021

Spatial-Temporal Transformer for Dynamic Scene Graph Generation ICCV 2021

Bridging the Gap Between Label- and Reference-Based Synthesis in Multi-Attribute Image-to-Image Translation ICCV 2021

Temporal Cue Guided Video Highlight Detection With Low-Rank Audio-Visual Fusion ICCV 2021

The Road To Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation ICCV 2021

Sampling Network Guided Cross-Entropy Method for Unsupervised Point Cloud Registration ICCV 2021

OadTR: Online Action Detection With Transformers ICCV 2021

Mutual-Complementing Framework for Nuclei Detection and Segmentation in Pathology Image ICCV 2021

Image Retrieval on Real-Life Images With Pre-Trained Vision-and-Language Models ICCV 2021

Spatially Conditioned Graphs for Detecting Human-Object Interactions ICCV 2021

YouRefIt: Embodied Reference Understanding With Language and Gesture ICCV 2021

Audio2Gestures: Generating Diverse Gestures From Speech Audio With Conditional Variational Autoencoders ICCV 2021

AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition ICCV 2021

Probabilistic Modeling for Human Mesh Recovery ICCV 2021

Bifold and Semantic Reasoning for Pedestrian Behavior Prediction ICCV 2021

AI Choreographer: Music Conditioned 3D Dance Generation With AIST++ ICCV 2021