conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13,057 papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
CMIE: Combining MLLM Insights with External Evidence for Explainable Out-of-Context Misinformation Detection
ACL 2025
GIMMICK: Globally Inclusive Multimodal Multitask Cultural Knowledge Benchmarking
ACL 2025
R-VLM: Region-Aware Vision Language Model for Precise GUI Grounding
ACL 2025
MMXU: A Multi-Modal and Multi-X-ray Understanding Dataset for Disease Progression
ACL 2025
Akan Cinematic Emotions (ACE): A Multimodal Multi-party Dataset for Emotion Recognition in Movie Dialogues
ACL 2025
Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
ACL 2025
SpeechT-RAG: Reliable Depression Detection in LLMs with Retrieval-Augmented Generation Using Speech Timing Information
ACL 2025
Unlocking Speech Instruction Data Potential with Query Rewriting
ACL 2025
Self-play through Computational Runtimes improves Chart Reasoning
ACL 2025
CoMuMDR: Code-mixed Multi-modal Multi-domain corpus for Discourse paRsing in conversations
ACL 2025
ProBench: Judging Multimodal Foundation Models on Open-ended Multi-domain Expert Tasks
ACL 2025
Generating Pedagogically Meaningful Visuals for Math Word Problems: A New Benchmark and Analysis of Text-to-Image Models
ACL 2025
Listen, Watch, and Learn to Feel: Retrieval-Augmented Emotion Reasoning for Compound Emotion Generation
ACL 2025
Large Language Models Are Natural Video Popularity Predictors
ACL 2025
A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges
ACL 2025
VAQUUM: Are Vague Quantifiers Grounded in Visual Data?
ACL 2025
Forgotten Polygons: Multimodal Large Language Models are Shape-Blind
ACL 2025
FIHA: Automated Fine-grained Hallucinations Evaluations in Large Vision Language Models with Davidson Scene Graphs
ACL 2025
Forget the Token and Pixel: Rethinking Gradient Ascent for Concept Unlearning in Multimodal Generative Models
ACL 2025
AIGuard: A Benchmark and Lightweight Detection for E-commerce AIGC Risks
ACL 2025
CoT-VTM: Visual-to-Music Generation with Chain-of-Thought Reasoning
ACL 2025
MDIT-Bench: Evaluating the Dual-Implicit Toxicity in Large Multimodal Models
ACL 2025
Express What You See: Can Multimodal LLMs Decode Visual Ciphers with Intuitive Semiosis Comprehension?
ACL 2025
Grounding Task Assistance with Multimodal Cues from a Single Demonstration
ACL 2025
I see what you mean: Co-Speech Gestures for Reference Resolution in Multimodal Dialogue
ACL 2025
<
1
…
77
78
79
…
523
>