conftrace
_
Papers
Trends
Conferences
Explore
More
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Keywords
multimodal learning
4645 papers
Explore in graph
Co-occurring keywords
large language model
(13587)
vision-language model
(2348)
visual question answering
(1017)
video understanding
(1658)
multi-modal learning
(1278)
contrastive learning
(4032)
representation learning
(6206)
transfer learning
(5449)
zero-shot learning
(3650)
vision language model
(767)
Papers
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
NIPS 2024
Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering
NIPS 2024
VLKEB: A Large Vision-Language Model Knowledge Editing Benchmark
NIPS 2024
Voila-A: Aligning Vision-Language Models with User's Gaze Attention
NIPS 2024
GPT4MTS: Prompt-based Large Language Model for Multimodal Time-series Forecasting
AAAI 2024
CLIPSyntel: CLIP and LLM Synergy for Multimodal Question Summarization in Healthcare
AAAI 2024
Video-Context Aligned Transformer for Video Question Answering
AAAI 2024
SECap: Speech Emotion Captioning with Large Language Model
AAAI 2024
Unsupervised Modality Adaptation with Text-to-Image Diffusion Models for Semantic Segmentation
NIPS 2024
Brotherhood at WMT 2024: Leveraging LLM-Generated Contextual Conversations for Cross-Lingual Image Captioning
EMNLP 2024
SCITUNE: Aligning Large Language Models with Human-Curated Scientific Multimodal Instructions
EMNLP 2024
Exploring the Potential of Multimodal LLM with Knowledge-Intensive Multimodal ASR
EMNLP 2024
nowhash at SemEval-2024 Task 4: Exploiting Fusion of Transformers for Detecting Persuasion Techniques in Multilingual Memes
NAACL 2024
LastResort at SemEval-2024 Task 3: Exploring Multimodal Emotion Cause Pair Extraction as Sequence Labelling Task
NAACL 2024
Fusion from a Distributional Perspective: A Unified Symbiotic Diffusion Framework for Any Multisource Remote Sensing Data Classification
IJCAI 2024
SoMeLVLM: A Large Vision Language Model for Social Media Processing
ACL 2024
FLEUR: An Explainable Reference-Free Evaluation Metric for Image Captioning Using a Large Multimodal Model
ACL 2024
iHealth-Chile-1 at RRG24: In-context Learning and Finetuning of a Large Multimodal Model for Radiology Report Generation
ACL 2024
How and where does CLIP process negation?
ACL 2024
Draw Step by Step: Reconstructing CAD Construction Sequences from Point Clouds via Multimodal Diffusion.
CVPR 2024
Source-Free Domain Adaptation with Frozen Multimodal Foundation Model
CVPR 2024
Can Large Multimodal Models Uncover Deep Semantics Behind Images?
ACL 2024
Faithful Chart Summarization with ChaTS-Pi
ACL 2024
Rethinking the Multimodal Correlation of Multimodal Sequential Learning via Generalizable Attentional Results Alignment
ACL 2024
Diffusion Mask-Driven Visual-language Tracking
IJCAI 2024
<
1
…
92
93
94
…
186
>