conftrace_

Artificial Intelligence › Core AI ›

Multimodal Learning

13,057 papers

Papers per year

Papers

In Search of the Lost Arch in Dialogue: A Dependency Dialogue Acts Corpus for Multi-Party Dialogues ACL 2025

InImageTrans: Multimodal LLM-based Text Image Machine Translation ACL 2025

When Large Language Models Meet Speech: A Survey on Integration Approaches ACL 2025

A Comprehensive Graph Framework for Question Answering with Mode-Seeking Preference Alignment ACL 2025

MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification ACL 2025

CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents ACL 2025

CliniDial: A Naturally Occurring Multimodal Dialogue Dataset for Team Reflection in Action During Clinical Operation ACL 2025

READoc: A Unified Benchmark for Realistic Document Structured Extraction ACL 2025

Creating a Lens of Chinese Culture: A Multimodal Dataset for Chinese Pun Rebus Art Understanding ACL 2025

BottleHumor: Self-Informed Humor Explanation using the Information Bottleneck Principle ACL 2025

Metagent-P: A Neuro-Symbolic Planning Agent with Metacognition for Open Worlds ACL 2025

DecompileBench: A Comprehensive Benchmark for Evaluating Decompilers in Real-World Scenarios ACL 2025

Enhance Multimodal Consistency and Coherence for Text-Image Plan Generation ACL 2025

LLM as Effective Streaming Processor: Bridging Streaming-Batch Mismatches with Group Position Encoding ACL 2025

YinYang-Align: A new Benchmark for Competing Objectives and Introducing Multi-Objective Preference based Text-to-Image Alignment ACL 2025

Time Travel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts ACL 2025

Improving MLLM’s Document Image Machine Translation via Synchronously Self-reviewing Its OCR Proficiency ACL 2025

iMOVE : Instance-Motion-Aware Video Understanding ACL 2025

Are Multimodal Large Language Models Pragmatically Competent Listeners in Simple Reference Resolution Tasks? ACL 2025

Latent Distribution Decouple for Uncertain-Aware Multimodal Multi-label Emotion Recognition ACL 2025

Seeing What Tastes Good: Revisiting Multimodal Distributional Semantics in the Billion Parameter Era ACL 2025

LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs ACL 2025

Burn After Reading: Do Multimodal Large Language Models Truly Capture Order of Events in Image Sequences? ACL 2025

Can VLMs Actually See and Read? A Survey on Modality Collapse in Vision-Language Models ACL 2025

WikiMixQA: A Multimodal Benchmark for Question Answering over Tables and Charts ACL 2025