conftrace_

multimodal learning

4645 papers

Explore in graph

Co-occurring keywords

large language model (13587) vision-language model (2348) visual question answering (1017) video understanding (1658) multi-modal learning (1278) contrastive learning (4032) representation learning (6206) transfer learning (5449) zero-shot learning (3650) vision language model (767)

Papers

Continuous Attentive Multimodal Prompt Tuning for Few-Shot Multimodal Sarcasm Detection EMNLP 2024

Enhancing Question Answering on Charts Through Effective Pre-training Tasks EMNLP 2024

Recent Advances in Online Hate Speech Moderation: Multimodality and the Role of Large Models EMNLP 2024

HOTVCOM: Generating Buzzworthy Comments for Videos ACL 2024

Advancement in Graph Understanding: A Multimodal Benchmark and Fine-Tuning of Vision-Language Models ACL 2024

Event-Radar: Event-driven Multi-View Learning for Multimodal Fake News Detection ACL 2024

Emosical: An Emotion-Annotated Musical Theatre Dataset EMNLP 2024

LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference EMNLP 2024

Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language Models EMNLP 2024

V-DPO: Mitigating Hallucination in Large Vision Language Models via Vision-Guided Direct Preference Optimization EMNLP 2024

SemEval-2024 Task 3: Multimodal Emotion Cause Analysis in Conversations NAACL 2024

Android in the Zoo: Chain-of-Action-Thought for GUI Agents EMNLP 2024

TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy NIPS 2024

Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMs EMNLP 2024

Visual Pivoting Unsupervised Multimodal Machine Translation in Low-Resource Distant Language Pairs EMNLP 2024

MuEP: A Multimodal Benchmark for Embodied Planning with Foundation Models IJCAI 2024

Geneverse: A Collection of Open-source Multimodal Large Language Models for Genomic and Proteomic Research EMNLP 2024

SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM EMNLP 2024

How Control Information Influences Multilingual Text Image Generation and Editing? NIPS 2024

PRISM: A New Lens for Improved Color Understanding EMNLP 2024

Training-free Deep Concept Injection Enables Language Models for Video Question Answering EMNLP 2024

T3M: Text Guided 3D Human Motion Synthesis from Speech NAACL 2024

On the Robustness of Large Multimodal Models Against Image Adversarial Attacks CVPR 2024

Generative Multi-modal Models are Good Class Incremental Learners CVPR 2024

A Vision Check-up for Language Models CVPR 2024