conftrace_

multimodal learning

4645 papers

Explore in graph

Co-occurring keywords

large language model (13587) vision-language model (2348) visual question answering (1017) video understanding (1658) multi-modal learning (1278) contrastive learning (4032) representation learning (6206) transfer learning (5449) zero-shot learning (3650) vision language model (767)

Papers

P²Net: Parallel Pointer-based Network for Key Information Extraction with Complex Layouts ACL 2025

Forget the Token and Pixel: Rethinking Gradient Ascent for Concept Unlearning in Multimodal Generative Models ACL 2025

MAGIC-VQA: Multimodal And Grounded Inference with Commonsense Knowledge for Visual Question Answering ACL 2025

Sign2Vis: Automated Data Visualization from Sign Language ACL 2025

READoc: A Unified Benchmark for Realistic Document Structured Extraction ACL 2025

Latent Distribution Decouple for Uncertain-Aware Multimodal Multi-label Emotion Recognition ACL 2025

Can Vision Language Models Understand Mimed Actions? ACL 2025

Challenging Multimodal LLMs with African Standardized Exams: A Document VQA Evaluation ACL 2025

Experiential Semantic Information and Brain Alignment: Are Multimodal Models Better than Language Models? ACL 2025

NAVER LABS Europe Submission to the Instruction-following Track ACL 2025

Quantifying Memorization and Parametric Response Rates in Retrieval-Augmented Vision-Language Models ACL 2025

Adaptive Linguistic Prompting (ALP) Enhances Phishing Webpage Detection in Multimodal Large Language Models ACL 2025

Instruction-tuned QwenChart for Chart Question Answering ACL 2025

UoR-NCL at SemEval-2025 Task 1: Using Generative LLMs and CLIP Models for Multilingual Multimodal Idiomaticity Representation ACL 2025

Zhoumou at SemEval-2025 Task 1: Leveraging Multimodal Data Augmentation and Large Language Models for Enhanced Idiom Understanding ACL 2025

Argumentative Fallacy Detection in Political Debates ACL 2025

A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges ACL 2025

AIGuard: A Benchmark and Lightweight Detection for E-commerce AIGC Risks ACL 2025

Dynamic Graph Neural ODE Network for Multi-modal Emotion Recognition in Conversation COLING 2025

Acquired TASTE: Multimodal Stance Detection with Textual and Structural Embeddings COLING 2025

Improvement in Sign Language Translation Using Text CTC Alignment COLING 2025

Improving the Efficiency of Visually Augmented Language Models COLING 2025

Howard University-AI4PC at SemEval-2025 Task 1: Using GPT-4o and CLIP-ViLT to Decode Figurative Language Across Text and Images ACL 2025

Multimodal Aspect-Based Sentiment Analysis under Conditional Relation COLING 2025

Temporally Grounding Instructional Diagrams in Unconstrained Videos WACV 2025