conftrace
_
Papers
Trends
Conferences
Explore
More
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Keywords
multimodal learning
4645 papers
Explore in graph
Co-occurring keywords
large language model
(13587)
vision-language model
(2348)
visual question answering
(1017)
video understanding
(1658)
multi-modal learning
(1278)
contrastive learning
(4032)
representation learning
(6206)
transfer learning
(5449)
zero-shot learning
(3650)
vision language model
(767)
Papers
Language Guided Visual Question Answering: Elevate Your Multimodal Language Model Using Knowledge-Enriched Prompts
EMNLP 2023
IMU2CLIP: Language-grounded Motion Sensor Translation with Multimodal Contrastive Learning
EMNLP 2023
Visual Prediction Improves Zero-Shot Cross-Modal Machine Translation
EMNLP 2023
Delivering Arbitrary-Modal Semantic Segmentation
CVPR 2023
Position-Guided Text Prompt for Vision-Language Pre-Training
CVPR 2023
Pic2Word: Mapping Pictures to Words for Zero-Shot Composed Image Retrieval
CVPR 2023
Revisiting Temporal Modeling for CLIP-Based Image-to-Video Knowledge Transferring
CVPR 2023
SceneTrilogy: On Human Scene-Sketch and Its Complementarity With Photo and Text
CVPR 2023
Improving Selective Visual Question Answering by Learning From Your Peers
CVPR 2023
Look, Radiate, and Learn: Self-Supervised Localisation via Radio-Visual Correspondence
CVPR 2023
Collaborative Static and Dynamic Vision-Language Streams for Spatio-Temporal Video Grounding
CVPR 2023
Efficient Multimodal Fusion via Interactive Prompting
CVPR 2023
HierVL: Learning Hierarchical Video-Language Embeddings
CVPR 2023
CelebV-Text: A Large-Scale Facial Text-Video Dataset
CVPR 2023
VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud
CVPR 2023
HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning
CVPR 2023
Multilateral Semantic Relations Modeling for Image Text Retrieval
CVPR 2023
Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object Detection
CVPR 2023
RONO: Robust Discriminative Learning With Noisy Labels for 2D-3D Cross-Modal Retrieval
CVPR 2023
Text With Knowledge Graph Augmented Transformer for Video Captioning
CVPR 2023
Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning
CVPR 2023
Test of Time: Instilling Video-Language Models With a Sense of Time
CVPR 2023
Image as a Foreign Language: BEiT Pretraining for Vision and Vision-Language Tasks
CVPR 2023
I2MVFormer: Large Language Model Generated Multi-View Document Supervision for Zero-Shot Image Classification
CVPR 2023
Context-Aware Alignment and Mutual Masking for 3D-Language Pre-Training
CVPR 2023
<
1
…
121
122
123
…
186
>