conftrace
_
Papers
Trends
Conferences
Explore
More
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Keywords
multimodal learning
4645 papers
Explore in graph
Co-occurring keywords
large language model
(13587)
vision-language model
(2348)
visual question answering
(1017)
video understanding
(1658)
multi-modal learning
(1278)
contrastive learning
(4032)
representation learning
(6206)
transfer learning
(5449)
zero-shot learning
(3650)
vision language model
(767)
Papers
SEA-VQA: Southeast Asian Cultural Context Dataset For Visual Question Answering
ACL 2024
MemeMind at ArAIEval Shared Task: Generative Augmentation and Feature Fusion for Multimodal Propaganda Detection in Arabic Memes through Advanced Language and Vision Models
ACL 2024
AlexUNLP-MZ at ArAIEval Shared Task: Contrastive Learning, LLM Features Extraction and Multi-Objective Optimization for Arabic Multi-Modal Meme Propaganda Detection
ACL 2024
TiMix: Text-Aware Image Mixing for Effective Vision-Language Pre-training
AAAI 2024
Image Captioning with Multi-Context Synthetic Data
AAAI 2024
A Multimodal, Multi-Task Adapting Framework for Video Action Recognition
AAAI 2024
Structural Information Guided Multimodal Pre-training for Vehicle-Centric Perception
AAAI 2024
Weakly Supervised Multimodal Affordance Grounding for Egocentric Images
AAAI 2024
THGFormer: Time-Aware Hypergraph Learning for Multimodal Social Media Popularity Prediction (Student Abstract)
AAAI 2024
Harnessing CLIP for Evidence Identification in Scientific Literature: A Multimodal Approach to Context24 Shared Task
ACL 2024
Ancient Chinese Glyph Identification Powered by Radical Semantics
ACL 2024
Relational Distant Supervision for Image Captioning without Image-Text Pairs
AAAI 2024
HSDreport: Heart Sound Diagnosis with Echocardiography Reports
EMNLP 2024
Soft Knowledge Prompt: Help External Knowledge Become a Better Teacher to Instruct LLM in Knowledge-based VQA
ACL 2024
Uni-Dubbing: Zero-Shot Speech Synthesis from Visual Articulation
ACL 2024
TinyChart: Efficient Chart Understanding with Program-of-Thoughts Learning and Visual Token Merging
EMNLP 2024
PolCLIP: A Unified Image-Text Word Sense Disambiguation Model via Generating Multimodal Complementary Representations
ACL 2024
From Sights to Insights: Towards Summarization of Multimodal Clinical Documents
ACL 2024
Lexicon-Level Contrastive Visual-Grounding Improves Language Modeling
ACL 2024
StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing
ACL 2024
Findings of WASSA 2024 Shared Task on Empathy and Personality Detection in Interactions
ACL 2024
Causal Mode Multiplexer: A Novel Framework for Unbiased Multispectral Pedestrian Detection
CVPR 2024
VideoCon: Robust Video-Language Alignment via Contrast Captions
CVPR 2024
JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context and Dynamics of Human Interactions Within Social Groups
CVPR 2024
Making Visual Sense of Oracle Bones for You and Me
CVPR 2024
<
1
…
88
89
90
…
186
>