conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13,057 papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
Controlling Prosody in End-to-End TTS: A Case Study on Contrastive Focus Generation
EMNLP 2021
Coreference by Appearance: Visually Grounded Event Coreference Resolution
EMNLP 2021
Towards a Methodology Supporting Semiautomatic Annotation of HeadMovements in Video-recorded Conversations
EMNLP 2021
VisualSem: a high-quality knowledge graph for vision and language
EMNLP 2021
Template-aware Attention Model for Earnings Call Report Generation
EMNLP 2021
Capturing Logical Structure of Visually Structured Documents with Multimodal Transition Parser
EMNLP 2021
Named Entity Recognition in Historic Legal Text: A Transformer and State Machine Ensemble Method
EMNLP 2021
Can images help recognize entities? A study of the role of images for Multimodal NER
EMNLP 2021
Specificity-Preserving RGB-D Saliency Detection
ICCV 2021
Composable Augmentation Encoding for Video Representation Learning
ICCV 2021
Spatial-Temporal Transformer for Dynamic Scene Graph Generation
ICCV 2021
Bridging the Gap Between Label- and Reference-Based Synthesis in Multi-Attribute Image-to-Image Translation
ICCV 2021
Temporal Cue Guided Video Highlight Detection With Low-Rank Audio-Visual Fusion
ICCV 2021
The Road To Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation
ICCV 2021
Sampling Network Guided Cross-Entropy Method for Unsupervised Point Cloud Registration
ICCV 2021
OadTR: Online Action Detection With Transformers
ICCV 2021
Mutual-Complementing Framework for Nuclei Detection and Segmentation in Pathology Image
ICCV 2021
Image Retrieval on Real-Life Images With Pre-Trained Vision-and-Language Models
ICCV 2021
Spatially Conditioned Graphs for Detecting Human-Object Interactions
ICCV 2021
YouRefIt: Embodied Reference Understanding With Language and Gesture
ICCV 2021
Audio2Gestures: Generating Diverse Gestures From Speech Audio With Conditional Variational Autoencoders
ICCV 2021
AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition
ICCV 2021
Probabilistic Modeling for Human Mesh Recovery
ICCV 2021
Bifold and Semantic Reasoning for Pedestrian Behavior Prediction
ICCV 2021
AI Choreographer: Music Conditioned 3D Dance Generation With AIST++
ICCV 2021
<
1
…
422
423
424
…
523
>