← Learning Types

Deep Learning › Learning Types ›

Multi-Modal Learning

3194 directly classified papers

Papers per year

Papers

CCHall: A Novel Benchmark for Joint Cross-Lingual and Cross-Modal Hallucinations Detection in Large Language Models ACL 2025

Multi-Modality Expansion and Retention for LLMs through Parameter Merging and Decoupling ACL 2025

HintsOfTruth: A Multimodal Checkworthiness Detection Dataset with Real and Synthetic Claims ACL 2025

Enabling Chatbots with Eyes and Ears: An Immersive Multimodal Conversation System for Dynamic Interactions ACL 2025

ConECT Dataset: Overcoming Data Scarcity in Context-Aware E-Commerce MT ACL 2025

NeKo: Cross-Modality Post-Recognition Error Correction with Tasks-Guided Mixture-of-Experts Language Model ACL 2025

Filter-And-Refine: A MLLM Based Cascade System for Industrial-Scale Video Content Moderation ACL 2025

MotiR: Motivation-aware Retrieval for Long-Tail Recommendation ACL 2025

A Character-Centric Creative Story Generation via Imagination ACL 2025

Double Entendre: Robust Audio-Based AI-Generated Lyrics Detection via Multi-View Fusion ACL 2025

Reasoning is All You Need for Video Generalization: A Counterfactual Benchmark with Sub-question Evaluation ACL 2025

DALR: Dual-level Alignment Learning for Multimodal Sentence Representation Learning ACL 2025

Multimodal Causal Reasoning Benchmark: Challenging Multimodal Large Language Models to Discern Causal Links Across Modalities ACL 2025

VCD: A Dataset for Visual Commonsense Discovery in Images ACL 2025

CTPD: Cross-Modal Temporal Pattern Discovery for Enhanced Multimodal Electronic Health Records Analysis ACL 2025

From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalities ACL 2025

P²Net: Parallel Pointer-based Network for Key Information Extraction with Complex Layouts ACL 2025

A Couch Potato is not a Potato on a Couch: Prompting Strategies, Image Generation, and Compositionality Prediction for Noun Compounds ACL 2025

VAQUUM: Are Vague Quantifiers Grounded in Visual Data? ACL 2025

MVL-SIB: A Massively Multilingual Vision-Language Benchmark for Cross-Modal Topical Matching ACL 2025

VP-MEL: Visual Prompts Guided Multimodal Entity Linking ACL 2025

MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens ACL 2025

VideoRAG: Retrieval-Augmented Generation over Video Corpus ACL 2025

BottleHumor: Self-Informed Humor Explanation using the Information Bottleneck Principle ACL 2025

EssayJudge: A Multi-Granular Benchmark for Assessing Automated Essay Scoring Capabilities of Multimodal Large Language Models ACL 2025