conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13,057 papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
RAVEN: Robust Advertisement Video Violation Temporal Grounding via Reinforcement Reasoning
ACL 2025
NeKo: Cross-Modality Post-Recognition Error Correction with Tasks-Guided Mixture-of-Experts Language Model
ACL 2025
Graph-Linguistic Fusion: Using Language Models for Wikidata Vandalism Detection
ACL 2025
LOTUS: A Leaderboard for Detailed Image Captioning from Quality to Societal Bias and User Preferences
ACL 2025
Filter-And-Refine: A MLLM Based Cascade System for Industrial-Scale Video Content Moderation
ACL 2025
A Framework for Flexible Extraction of Clinical Event Contextual Properties from Electronic Health Records
ACL 2025
MICE: Mixture of Image Captioning Experts Augmented e-Commerce Product Attribute Value Extraction
ACL 2025
Judging the Judges: Can Large Vision-Language Models Fairly Evaluate Chart Comprehension and Reasoning?
ACL 2025
SingaKids: A Multilingual Multimodal Dialogic Tutor for Language Learning
ACL 2025
From Recall to Creation: Generating Follow-Up Questions Using Bloom’s Taxonomy and Grice’s Maxims
ACL 2025
Towards Generating Controllable and Solvable Geometry Problem by Leveraging Symbolic Deduction Engine
ACL 2025
EcoDoc: A Cost-Efficient Multimodal Document Processing System for Enterprises Using LLMs
ACL 2025
Beyond Perception: Evaluating Abstract Visual Reasoning through Multi-Stage Task
ACL 2025
Visual Cues Enhance Predictive Turn-Taking for Two-Party Human Interaction
ACL 2025
Detecting and Mitigating Challenges in Zero-Shot Video Summarization with Video LLMs
ACL 2025
Exploring Multi-Modal Data with Tool-Augmented LLM Agents for Precise Causal Discovery
ACL 2025
scRAG: Hybrid Retrieval-Augmented Generation for LLM-based Cross-Tissue Single-Cell Annotation
ACL 2025
Improve Language Model and Brain Alignment via Associative Memory
ACL 2025
Towards Reliable Large Audio Language Model
ACL 2025
MUSE: A Multimodal Conversational Recommendation Dataset with Scenario-Grounded User Profiles
ACL 2025
GlyphPattern: An Abstract Pattern Recognition for Vision-Language Models
ACL 2025
Data-Centric Improvements for Enhancing Multi-Modal Understanding in Spoken Conversation Modeling
ACL 2025
UQ-Merge: Uncertainty Guided Multimodal Large Language Model Merging
ACL 2025
Ponder & Press: Advancing Visual GUI Agent towards General Computer Control
ACL 2025
A Character-Centric Creative Story Generation via Imagination
ACL 2025
<
1
…
73
74
75
…
523
>