conftrace_

Artificial Intelligence › Core AI ›

Multimodal Learning

13,057 papers

Papers per year

Papers

CMIE: Combining MLLM Insights with External Evidence for Explainable Out-of-Context Misinformation Detection ACL 2025

GIMMICK: Globally Inclusive Multimodal Multitask Cultural Knowledge Benchmarking ACL 2025

R-VLM: Region-Aware Vision Language Model for Precise GUI Grounding ACL 2025

MMXU: A Multi-Modal and Multi-X-ray Understanding Dataset for Disease Progression ACL 2025

Akan Cinematic Emotions (ACE): A Multimodal Multi-party Dataset for Emotion Recognition in Movie Dialogues ACL 2025

Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models ACL 2025

SpeechT-RAG: Reliable Depression Detection in LLMs with Retrieval-Augmented Generation Using Speech Timing Information ACL 2025

Unlocking Speech Instruction Data Potential with Query Rewriting ACL 2025

Self-play through Computational Runtimes improves Chart Reasoning ACL 2025

CoMuMDR: Code-mixed Multi-modal Multi-domain corpus for Discourse paRsing in conversations ACL 2025

ProBench: Judging Multimodal Foundation Models on Open-ended Multi-domain Expert Tasks ACL 2025

Generating Pedagogically Meaningful Visuals for Math Word Problems: A New Benchmark and Analysis of Text-to-Image Models ACL 2025

Listen, Watch, and Learn to Feel: Retrieval-Augmented Emotion Reasoning for Compound Emotion Generation ACL 2025

Large Language Models Are Natural Video Popularity Predictors ACL 2025

A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges ACL 2025

VAQUUM: Are Vague Quantifiers Grounded in Visual Data? ACL 2025

Forgotten Polygons: Multimodal Large Language Models are Shape-Blind ACL 2025

FIHA: Automated Fine-grained Hallucinations Evaluations in Large Vision Language Models with Davidson Scene Graphs ACL 2025

Forget the Token and Pixel: Rethinking Gradient Ascent for Concept Unlearning in Multimodal Generative Models ACL 2025

AIGuard: A Benchmark and Lightweight Detection for E-commerce AIGC Risks ACL 2025

CoT-VTM: Visual-to-Music Generation with Chain-of-Thought Reasoning ACL 2025

MDIT-Bench: Evaluating the Dual-Implicit Toxicity in Large Multimodal Models ACL 2025

Express What You See: Can Multimodal LLMs Decode Visual Ciphers with Intuitive Semiosis Comprehension? ACL 2025

Grounding Task Assistance with Multimodal Cues from a Single Demonstration ACL 2025

I see what you mean: Co-Speech Gestures for Reference Resolution in Multimodal Dialogue ACL 2025