conftrace
_
Papers
Trends
Conferences
Explore
Authors
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13,057 papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
Who Can Withstand Chat-Audio Attacks? An Evaluation Benchmark for Large Audio-Language Models
ACL 2025
Libra: Leveraging Temporal Images for Biomedical Radiology Analysis
ACL 2025
MC-MKE: A Fine-Grained Multimodal Knowledge Editing Benchmark Emphasizing Modality Consistency
ACL 2025
Multimodal Fusion and Coherence Modeling for Video Topic Segmentation
ACL 2025
Look & Mark: Leveraging Radiologist Eye Fixations and Bounding boxes in Multimodal Large Language Models for Chest X-ray Report Generation
ACL 2025
Code-SPA: Style Preference Alignment to Large Language Models for Effective and Robust Code Debugging
ACL 2025
Sign2Vis: Automated Data Visualization from Sign Language
ACL 2025
JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse
ACL 2025
Generative Frame Sampler for Long Video Understanding
ACL 2025
VISIAR: Empower MLLM for Visual Story Ideation
ACL 2025
Biases Propagate in Encoder-based Vision-Language Models: A Systematic Analysis From Intrinsic Measures to Zero-shot Retrieval Outcomes
ACL 2025
Can Multimodal Foundation Models Understand Schematic Diagrams? An Empirical Study on Information-Seeking QA over Scientific Papers
ACL 2025
MVTamperBench: Evaluating Robustness of Vision-Language Models
ACL 2025
Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models
ACL 2025
Vision-Language Models Struggle to Align Entities across Modalities
ACL 2025
V-ALPHASOCIAL: Benchmark and Self-Reflective Chain-of-Thought Generation for Visual Social Commonsense Reasoning
ACL 2025
ChartQAPro: A More Diverse and Challenging Benchmark for Chart Question Answering
ACL 2025
From Observation to Understanding: Front-Door Adjustments with Uncertainty Calibration for Enhancing Egocentric Reasoning in LVLMs
ACL 2025
Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA
ACL 2025
EgoNormia: Benchmarking Physical-Social Norm Understanding
ACL 2025
MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct
ACL 2025
SciVerse: Unveiling the Knowledge Comprehension and Visual Reasoning of LMMs on Multi-modal Scientific Problems
ACL 2025
LLM-Symbolic Integration for Robust Temporal Tabular Reasoning
ACL 2025
Multimodal Large Language Models for Text-rich Image Understanding: A Comprehensive Review
ACL 2025
PruneVid: Visual Token Pruning for Efficient Video Large Language Models
ACL 2025
<
1
…
79
80
81
…
523
>