conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13,057 papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
ThaiOCRBench: A Task-Diverse Benchmark for Vision-Language Understanding in Thai
AACL 2025
The Visual Counter Turing Test (VCT²): A Benchmark for Evaluating AI-Generated Image Detection and the Visual AI Index (V_AI)
AACL 2025
Breaking Bad: Norms for Valence, Arousal, and Dominance for over 10k English Multiword Expressions
AACL 2025
A Multimodal Recaptioning Framework to Account for Perceptual Diversity Across Languages in Vision-Language Modeling
AACL 2025
INTERCHART: Benchmarking Visual Reasoning Across Decomposed and Distributed Chart Information
AACL 2025
What Are They Talking About? A Benchmark of Knowledge-Grounded Discussion Summarization
AACL 2025
Q2E: Query-to-Event Decomposition for Zero-Shot Multilingual Text-to-Video Retrieval
AACL 2025
VideoChain: A Transformer-Based Framework for Multi-hop Video Question Generation
AACL 2025
Integrating Video and Text: A Balanced Approach to Multimodal Summary Generation and Evaluation
AACL 2025
A Diagnostic Framework for Auditing Reference-Free Vision-Language Metrics
AACL 2025
Adaptive Collaborative Labeling with MLLMs for Low-Resource Multimodal Emotion Recognition
AACL 2025
ReVision: A Dataset and Baseline VLM for Privacy-Preserving Task-Oriented Visual Instruction Rewriting
AACL 2025
MuSciClaims: Multimodal Scientific Claim Verification
AACL 2025
Video-guided Machine Translation: A Survey of Models, Datasets, and Challenges
AACL 2025
Referring Expressions as a Lens into Spatial Language Grounding in Vision-Language Models
AACL 2025
Rethinking Information Synthesis in Multimodal Question Answering A Multi-Agent Perspective
AACL 2025
Indic-S2ST: a Multilingual and Multimodal Many-to-Many Indic Speech-to-Speech Translation Dataset
AACL 2025
Is OpenVLA Truly Robust? A Systematic Evaluation of Positional Robustness
AACL 2025
Seeing isn’t Hearing: Benchmarking Vision Language Models at Interpreting Spectrograms
AACL 2025
Speak & Spell: LLM-Driven Controllable Phonetic Error Augmentation for Robust Dialogue State Tracking
AACL 2025
IFEval-Audio: Benchmarking Instruction-Following Capability in Audio-based Large Language Models
AACL 2025
PerMed-MM: A Multimodal, Multi-Specialty Persian Medical Benchmark for Evaluating Vision Language Models
AACL 2025
What am I missing here?: Evaluating Large Language Models for Masked Sentence Prediction
AACL 2025
HotelMatch-LLM: Joint Multi-Task Training of Small and Large Language Models for Efficient Multimodal Hotel Retrieval
ACL 2025
Can Multimodal Large Language Models Understand Spatial Relations?
ACL 2025
<
1
…
63
64
65
…
523
>