conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13,057 papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
In Search of the Lost Arch in Dialogue: A Dependency Dialogue Acts Corpus for Multi-Party Dialogues
ACL 2025
InImageTrans: Multimodal LLM-based Text Image Machine Translation
ACL 2025
When Large Language Models Meet Speech: A Survey on Integration Approaches
ACL 2025
A Comprehensive Graph Framework for Question Answering with Mode-Seeking Preference Alignment
ACL 2025
MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification
ACL 2025
CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents
ACL 2025
CliniDial: A Naturally Occurring Multimodal Dialogue Dataset for Team Reflection in Action During Clinical Operation
ACL 2025
READoc: A Unified Benchmark for Realistic Document Structured Extraction
ACL 2025
Creating a Lens of Chinese Culture: A Multimodal Dataset for Chinese Pun Rebus Art Understanding
ACL 2025
BottleHumor: Self-Informed Humor Explanation using the Information Bottleneck Principle
ACL 2025
Metagent-P: A Neuro-Symbolic Planning Agent with Metacognition for Open Worlds
ACL 2025
DecompileBench: A Comprehensive Benchmark for Evaluating Decompilers in Real-World Scenarios
ACL 2025
Enhance Multimodal Consistency and Coherence for Text-Image Plan Generation
ACL 2025
LLM as Effective Streaming Processor: Bridging Streaming-Batch Mismatches with Group Position Encoding
ACL 2025
YinYang-Align: A new Benchmark for Competing Objectives and Introducing Multi-Objective Preference based Text-to-Image Alignment
ACL 2025
Time Travel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts
ACL 2025
Improving MLLM’s Document Image Machine Translation via Synchronously Self-reviewing Its OCR Proficiency
ACL 2025
iMOVE : Instance-Motion-Aware Video Understanding
ACL 2025
Are Multimodal Large Language Models Pragmatically Competent Listeners in Simple Reference Resolution Tasks?
ACL 2025
Latent Distribution Decouple for Uncertain-Aware Multimodal Multi-label Emotion Recognition
ACL 2025
Seeing What Tastes Good: Revisiting Multimodal Distributional Semantics in the Billion Parameter Era
ACL 2025
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
ACL 2025
Burn After Reading: Do Multimodal Large Language Models Truly Capture Order of Events in Image Sequences?
ACL 2025
Can VLMs Actually See and Read? A Survey on Modality Collapse in Vision-Language Models
ACL 2025
WikiMixQA: A Multimodal Benchmark for Question Answering over Tables and Charts
ACL 2025
<
1
…
80
81
82
…
523
>