conftrace_

multimodal learning

4622 papers

Explore in graph

Co-occurring keywords

large language model (12755) vision-language model (2235) visual question answering (1000) video understanding (1647) multi-modal learning (1276) contrastive learning (3979) representation learning (6174) transfer learning (5442) zero-shot learning (3637) vision language model (752)

Papers

Multimodal Retrieval-Augmented Generation: Unified Information Processing Across Text, Image, Table, and Video Modalities ACL 2025

CliME: Evaluating Multimodal Climate Discourse on Social Media and the Climate Alignment Quotient (CAQ) ACL 2025

CTYUN-AI at SemEval-2025 Task 1: Learning to Rank for Idiomatic Expressions ACL 2025

PALI-NLP at SemEval 2025 Task 1: Multimodal Idiom Recognition and Alignment ACL 2025

JNLP at SemEval-2025 Task 1: Multimodal Idiomaticity Representation with Large Language Models ACL 2025

PoliTo at SemEval-2025 Task 1: Beyond Literal Meaning: A Chain-of-Though Approach for Multimodal Idiomacity Understanding ACL 2025

HiTZ-Ixa at SemEval-2025 Task 1: Multimodal Idiomatic Language Understanding ACL 2025

Detecting Referring Expressions in Visually Grounded Dialogue with Autoregressive Language Models ACL 2025

Overview of MM-ArgFallacy2025 on Multimodal Argumentative Fallacy Detection and Classification in Political Debates ACL 2025

Multimodal Argumentative Fallacy Classification in Political Debates ACL 2025

Prompt-Guided Augmentation and Multi-modal Fusion for Argumentative Fallacy Classification in Political Debates ACL 2025

Leveraging Context for Multimodal Fallacy Classification in Political Debates ACL 2025

Table Understanding and (Multimodal) LLMs: A Cross-Domain Case Study on Scientific vs. Non-Scientific Data ACL 2025

REVEAL: Multi-turn Evaluation of Image-Input Harms for Vision LLMs IJCAI 2025

Modgenix at SemEval-2025 Task 1: Context Aware Vision Language Ranking (CAViLR) for Multimodal Idiomaticity Understanding ACL 2025

Tri-Ergon: Fine-Grained Video-to-Audio Generation with Multi-Modal Conditions and LUFS Control AAAI 2025

CognitionCapturer: Decoding Visual Stimuli from Human EEG Signal with Multimodal Information AAAI 2025

Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark AAAI 2025

Evaluating LLM-Generated Diagrams as Graphs EMNLP 2025

Debiased Multimodal Understanding for Human Language Sequences AAAI 2025

LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living CVPR 2025

MultiDocFusion : Hierarchical and Multimodal Chunking Pipeline for Enhanced RAG on Long Industrial Documents EMNLP 2025

Enhancing Large Language Models for Scientific Multimodal Summarization with Multimodal Output COLING 2025

XFormParser: A Simple and Effective Multimodal Multilingual Semi-structured Form Parser COLING 2025

Spatial Alignment and Temporal Matching Adapter for Video-Radar Remote Physiological Measurement ICCV 2025