conftrace
_
Papers
Trends
Conferences
Explore
More
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Keywords
multimodal learning
4645 papers
Explore in graph
Co-occurring keywords
large language model
(13587)
vision-language model
(2348)
visual question answering
(1017)
video understanding
(1658)
multi-modal learning
(1278)
contrastive learning
(4032)
representation learning
(6206)
transfer learning
(5449)
zero-shot learning
(3650)
vision language model
(767)
Papers
What Does Your Smile Mean? Jointly Detecting Multi-Modal Sarcasm and Sentiment Using Quantum Probability
EMNLP 2021
Integrating Visuospatial, Linguistic, and Commonsense Structure into Story Visualization
EMNLP 2021
DORi: Discovering Object Relationships for Moment Localization of a Natural Language Query in a Video
WACV 2021
Transferring Knowledge from Vision to Language: How to Achieve it and how to Measure it?
EMNLP 2021
Journalistic Guidelines Aware News Image Captioning
EMNLP 2021
Look Before You Speak: Visually Contextualized Utterances
CVPR 2021
Domain-Robust VQA With Diverse Datasets and Methods but No Target Labels
CVPR 2021
Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor Segmentation
CVPR 2021
FAIEr: Fidelity and Adequacy Ensured Image Caption Evaluation
CVPR 2021
HOTR: End-to-End Human-Object Interaction Detection With Transformers
CVPR 2021
Connecting What To Say With Where To Look by Modeling Human Attention Traces
CVPR 2021
Can Audio-Visual Integration Strengthen Robustness Under Multimodal Attacks?
CVPR 2021
VinVL: Revisiting Visual Representations in Vision-Language Models
CVPR 2021
Learning the Best Pooling Strategy for Visual Semantic Embedding
CVPR 2021
Beyond Image to Depth: Improving Depth Prediction Using Echoes
CVPR 2021
Rethinking Label-Wise Cross-Modal Retrieval from A Semantic Sharing Perspective
IJCAI 2021
StacMR: Scene-Text Aware Cross-Modal Retrieval
WACV 2021
Multimodal Humor Dataset: Predicting Laughter Tracks for Sitcoms
WACV 2021
Compositional Learning of Image-Text Query for Image Retrieval
WACV 2021
Enhancing Audio-Visual Association with Self-Supervised Curriculum Learning
AAAI 2021
Enhanced Audio Tagging via Multi- to Single-Modal Teacher-Student Mutual Learning
AAAI 2021
Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning
AAAI 2021
Similarity Reasoning and Filtration for Image-Text Matching
AAAI 2021
Move2Hear: Active Audio-Visual Source Separation
ICCV 2021
Multimodal Clustering Networks for Self-Supervised Learning From Unlabeled Videos
ICCV 2021
<
1
…
149
150
151
…
186
>