Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Multi-Modal Learning
1457 directly classified papers
Papers per year
2011: 1
2013: 4
2014: 3
2015: 3
2016: 9
2017: 11
2018: 27
2019: 61
2020: 109
2021: 87
2022: 153
2023: 213
2024: 391
2025: 384
2026: 1
Papers
FactDebug at SemEval-2025 Task 7: Hybrid Retrieval Pipeline for Identifying Previously Fact-Checked Claims Across Multiple Languages
ACL 2025
Howard University-AI4PC at SemEval-2025 Task 1: Using GPT-4o and CLIP-ViLT to Decode Figurative Language Across Text and Images
ACL 2025
PALI-NLP at SemEval 2025 Task 1: Multimodal Idiom Recognition and Alignment
ACL 2025
AIMA at SemEval-2025 Task 1: Bridging Text and Image for Idiomatic Knowledge Extraction via Mixture of Experts
ACL 2025
WinSpot: GUI Grounding Benchmark with Multimodal Large Language Models
ACL 2025
Benchmarking Table Extraction: Multimodal LLMs vs Traditional OCR
ACL 2025
DecepBench: Benchmarking Multimodal Deception Detection
ACL 2025
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data and Metric Perspectives
ICCV 2025
Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding
ACL 2025
Texts or Images? A Fine-grained Analysis on the Effectiveness of Input Representations and Models for Table Question Answering
ACL 2025
Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models
CVPR 2024
Optimal Transport Guided Correlation Assignment for Multimodal Entity Linking
ACL 2024
Weakly-Supervised Audio-Visual Video Parsing with Prototype-based Pseudo-Labeling
CVPR 2024
Tell Me What’s Next: Textual Foresight for Generic UI Representations
ACL 2024
Kiss up, Kick down: Exploring Behavioral Changes in Multi-modal Large Language Models with Assigned Visual Personas
EMNLP 2024
Improved Visual Grounding through Self-Consistent Explanations
CVPR 2024
Enhanced Motion-Text Alignment for Image-to-Video Transfer Learning
CVPR 2024
Event Stream-based Visual Object Tracking: A High-Resolution Benchmark Dataset and A Novel Baseline
CVPR 2024
Iterated Learning Improves Compositionality in Large Vision-Language Models
CVPR 2024
iKUN: Speak to Trackers without Retraining
CVPR 2024
Autonomous Workflow for Multimodal Fine-Grained Training Assistants Towards Mixed Reality
ACL 2024
3DBench: A Scalable 3D Benchmark and Instruction-Tuning Dataset
IJCAI 2024
Incorporating Syntax and Lexical Knowledge to Multilingual Sentiment Classification on Large Language Models
ACL 2024
GROUNDHOG: Grounding Large Language Models to Holistic Segmentation
CVPR 2024
Open Vocabulary Semantic Scene Sketch Understanding
CVPR 2024
<
1
…
15
16
17
…
59
>