conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13,057 papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
ACL 2025
Can Graph Descriptive Order Affect Solving Graph Problems with LLMs?
ACL 2025
Evaluating Multimodal Large Language Models on Video Captioning via Monte Carlo Tree Search
ACL 2025
Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models
ACL 2025
AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models
ACL 2025
FloorPlan-LLaMa: Aligning Architects’ Feedback and Domain Knowledge in Architectural Floor Plan Generation
ACL 2025
TheoremExplainAgent: Towards Video-based Multimodal Explanations for LLM Theorem Understanding
ACL 2025
ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control
ACL 2025
Rolling the DICE on Idiomaticity: How LLMs Fail to Grasp Context
ACL 2025
ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation
ACL 2025
OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser Use
ACL 2025
VLM2-Bench: A Closer Look at How Well VLMs Implicitly Link Explicit Matching Visual Cues
ACL 2025
ActiView: Evaluating Active Perception Ability for Multimodal Large Language Models
ACL 2025
AXIS: Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents
ACL 2025
SpaRE: Enhancing Spatial Reasoning in Vision-Language Models with Synthetic Data
ACL 2025
R2-MultiOmnia: Leading Multilingual Multimodal Reasoning via Self-Training
ACL 2025
VLSBench: Unveiling Visual Leakage in Multimodal Safety
ACL 2025
Browsing Lost Unformed Recollections: A Benchmark for Tip-of-the-Tongue Search and Reasoning
ACL 2025
Revisiting Classical Chinese Event Extraction with Ancient Literature Information
ACL 2025
A Survey on Patent Analysis: From NLP to Multimodal AI
ACL 2025
SciVer: Evaluating Foundation Models for Multimodal Scientific Claim Verification
ACL 2025
COSMMIC: Comment-Sensitive Multimodal Multilingual Indian Corpus for Summarization and Headline Generation
ACL 2025
Mind the Gap: Static and Interactive Evaluations of Large Audio Models
ACL 2025
Identifying Cellular Niches in Spatial Transcriptomics: An Investigation into the Capabilities of Large Language Models
ACL 2025
Focus on What Matters: Enhancing Medical Vision-Language Models with Automatic Attention Alignment Tuning
ACL 2025
<
1
…
65
66
67
…
523
>