Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Multi-Modal Learning
1457 directly classified papers
Papers per year
2011: 1
2013: 4
2014: 3
2015: 3
2016: 9
2017: 11
2018: 27
2019: 61
2020: 109
2021: 87
2022: 153
2023: 213
2024: 391
2025: 384
2026: 1
Papers
Free on the Fly: Enhancing Flexibility in Test-Time Adaptation with Online EM
CVPR 2025
Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation
CVPR 2025
Finding Needles in Images: Can Multi-modal LLMs Locate Fine Details?
ACL 2025
Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models
ACL 2025
FCMR: Robust Evaluation of Financial Cross-Modal Multi-Hop Reasoning
ACL 2025
MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation
ACL 2025
Walk in Others’ Shoes with a Single Glance: Human-Centric Visual Grounding with Top-View Perspective Transformation
ACL 2025
Exploring Multimodal Relation Extraction of Hierarchical Tabular Data with Multi-task Learning
ACL 2025
CheXalign: Preference fine-tuning in chest X-ray interpretation models without human feedback
ACL 2025
Sightation Counts: Leveraging Sighted User Feedback in Building a BLV-aligned Dataset of Diagram Descriptions
ACL 2025
Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images
ACL 2025
CoachMe: Decoding Sport Elements with a Reference-Based Coaching Instruction Generation Model
ACL 2025
Predicting Implicit Arguments in Procedural Video Instructions
ACL 2025
SpeechIQ: Speech-Agentic Intelligence Quotient Across Cognitive Levels in Voice Understanding by Large Language Models
ACL 2025
HintsOfTruth: A Multimodal Checkworthiness Detection Dataset with Real and Synthetic Claims
ACL 2025
Hidden in Plain Sight: Evaluation of the Deception Detection Capabilities of LLMs in Multimodal Settings
ACL 2025
Enabling Chatbots with Eyes and Ears: An Immersive Multimodal Conversation System for Dynamic Interactions
ACL 2025
CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global Memory
ACL 2025
S3E: Self-Supervised State Estimation for Radar-Inertial System
ICCV 2025
Grounded, or a Good Guesser? A Per-Question Balanced Dataset to Separate Blind from Grounded Models for Embodied Question Answering
ACL 2025
WinSpot: GUI Grounding Benchmark with Multimodal Large Language Models
ACL 2025
DRUM: Learning Demonstration Retriever for Large MUlti-modal Models
ACL 2025
Do Multimodal Large Language Models Truly See What We Point At? Investigating Indexical, Iconic, and Symbolic Gesture Comprehension
ACL 2025
Towards Geo-Culturally Grounded LLM Generations
ACL 2025
PEACE: Empowering Geologic Map Holistic Understanding with MLLMs
CVPR 2025
<
1
…
12
13
14
…
59
>