conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13,057 papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
CoRe-MMRAG: Cross-Source Knowledge Reconciliation for Multimodal RAG
ACL 2025
AmbiK: Dataset of Ambiguous Tasks in Kitchen Environment
ACL 2025
Design Choices for Extending the Context Length of Visual Language Models
ACL 2025
Automatic detection of dyslexia based on eye movements during reading in Russian
ACL 2025
BQA: Body Language Question Answering Dataset for Video Large Language Models
ACL 2025
Grounded, or a Good Guesser? A Per-Question Balanced Dataset to Separate Blind from Grounded Models for Embodied Question Answering
ACL 2025
Learning Sparsity for Effective and Efficient Music Performance Question Answering
ACL 2025
Towards Geo-Culturally Grounded LLM Generations
ACL 2025
Do Multimodal Large Language Models Truly See What We Point At? Investigating Indexical, Iconic, and Symbolic Gesture Comprehension
ACL 2025
Fast or Slow? Integrating Fast Intuition and Deliberate Thinking for Enhancing Visual Question Answering
ACL 2025
Multilingual Gloss-free Sign Language Translation: Towards Building a Sign Language Foundation Model
ACL 2025
Sparse-to-Dense: A Free Lunch for Lossless Acceleration of Video Understanding in LLMs
ACL 2025
Transferring Textual Preferences to Vision-Language Understanding through Model Merging
ACL 2025
WinSpot: GUI Grounding Benchmark with Multimodal Large Language Models
ACL 2025
MERaLiON-AudioLLM: Advancing Speech and Language Understanding for Singapore
ACL 2025
FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation
ACL 2025
DEEP: an automatic bidirectional translator leveraging an ASR for translation of Italian sign language
ACL 2025
FORG3D: Flexible Object Rendering for Generating Vision-Language Spatial Reasoning Data from 3D Scenes
ACL 2025
FlagEval-Arena: A Side-by-Side Comparative Evaluation Platform for Large Language Models and Text-Driven AIGC
ACL 2025
FlexRAG: A Flexible and Comprehensive Framework for Retrieval-Augmented Generation
ACL 2025
Transforming Brainwaves into Language: EEG Microstates Meet Text Embedding Models for Dementia Detection
ACL 2025
Voices of Dissent: A Multimodal Analysis of Protest Songs through Lyrics and Audio
ACL 2025
Chart Question Answering from Real-World Analytical Narratives
ACL 2025
DRUM: Learning Demonstration Retriever for Large MUlti-modal Models
ACL 2025
Time-LlaMA: Adapting Large Language Models for Time Series Modeling via Dynamic Low-rank Adaptation
ACL 2025
<
1
…
72
73
74
…
523
>