conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13,057 papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
CORDIAL: Can Multimodal Large Language Models Effectively Understand Coherence Relationships?
ACL 2025
LLM Meets Scene Graph: Can Large Language Models Understand and Generate Scene Graphs? A Benchmark and Empirical Study
ACL 2025
World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning
ACL 2025
VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search
ACL 2025
Automated CAD Modeling Sequence Generation from Text Descriptions via Transformer-Based Large Language Models
ACL 2025
Knowledge Image Matters: Improving Knowledge-Based Visual Reasoning with Multi-Image Large Language Models
ACL 2025
GUICourse: From General Vision Language Model to Versatile GUI Agent
ACL 2025
Evaluating Visual and Cultural Interpretation: The K-Viscuit Benchmark with Human-VLM Collaboration
ACL 2025
Enhancing Multimodal Retrieval via Complementary Information Extraction and Alignment
ACL 2025
Proxy-Driven Robust Multimodal Sentiment Analysis with Incomplete Data
ACL 2025
Disentangling Language and Culture for Evaluating Multilingual Large Language Models
ACL 2025
Caution for the Environment: Multimodal LLM Agents are Susceptible to Environmental Distractions
ACL 2025
ChartLens: Fine-grained Visual Attribution in Charts
ACL 2025
MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation
ACL 2025
The Role of Visual Modality in Multimodal Mathematical Reasoning: Challenges and Insights
ACL 2025
Does the Emotional Understanding of LVLMs Vary Under High-Stress Environments and Across Different Demographic Attributes?
ACL 2025
FCMR: Robust Evaluation of Financial Cross-Modal Multi-Hop Reasoning
ACL 2025
T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI Feedback
ACL 2025
QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions
ACL 2025
Finding Needles in Images: Can Multi-modal LLMs Locate Fine Details?
ACL 2025
Asclepius: A Spectrum Evaluation Benchmark for Medical Multi-Modal Large Language Models
ACL 2025
InstructPart: Task-Oriented Part Segmentation with Instruction Reasoning
ACL 2025
OMGM: Orchestrate Multiple Granularities and Modalities for Efficient Multimodal Retrieval
ACL 2025
WAFFLE: Fine-tuning Multi-Modal Model for Automated Front-End Development
ACL 2025
SEA: Low-Resource Safety Alignment for Multimodal Large Language Models via Synthetic Embeddings
ACL 2025
<
1
…
69
70
71
…
523
>