Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Multi-Modal Learning
1457 directly classified papers
Papers per year
2011: 1
2013: 4
2014: 3
2015: 3
2016: 9
2017: 11
2018: 27
2019: 61
2020: 109
2021: 87
2022: 153
2023: 213
2024: 391
2025: 384
2026: 1
Papers
SYNAPSE: SYmbolic Neural-Aided Preference Synthesis Engine
AAAI 2025
PhishAgent: A Robust Multimodal Agent for Phishing Webpage Detection
AAAI 2025
OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser Use
ACL 2025
Aligning VLM Assistants with Personalized Situated Cognition
ACL 2025
Differentiated Vision: Unveiling Entity-Specific Visual Modality Requirements for Multimodal Knowledge Graph
EMNLP 2025
Focus on What Matters: Enhancing Medical Vision-Language Models with Automatic Attention Alignment Tuning
ACL 2025
Attacking Vision-Language Computer Agents via Pop-ups
ACL 2025
AutoGUI: Scaling GUI Grounding with Automatic Functionality Annotations from LLMs
ACL 2025
MCS-Bench: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models in Chinese Classical Studies
ACL 2025
Chain-Talker: Chain Understanding and Rendering for Empathetic Conversational Speech Synthesis
ACL 2025
Texts or Images? A Fine-grained Analysis on the Effectiveness of Input Representations and Models for Table Question Answering
ACL 2025
Activation Steering Decoding: Mitigating Hallucination in Large Vision-Language Models through Bidirectional Hidden State Intervention
ACL 2025
FlashAudio: Rectified Flow for Fast and High-Fidelity Text-to-Audio Generation
ACL 2025
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference
ACL 2025
MM-R3: On (In-)Consistency of Vision-Language Models (VLMs)
ACL 2025
Improving Medical Large Vision-Language Models with Abnormal-Aware Feedback
ACL 2025
Cultivating Gaming Sense for Yourself: Making VLMs Gaming Experts
ACL 2025
CLAIM: Mitigating Multilingual Object Hallucination in Large Vision-Language Models with Cross-Lingual Attention Intervention
ACL 2025
Exploring Compositional Generalization of Multimodal LLMs for Medical Imaging
ACL 2025
MM-Verify: Enhancing Multimodal Reasoning with Chain-of-Thought Verification
ACL 2025
RSVP: Reasoning Segmentation via Visual Prompting and Multi-modal Chain-of-Thought
ACL 2025
Incongruity-aware Tension Field Network for Multi-modal Sarcasm Detection
ACL 2025
Can MLLMs Understand the Deep Implication Behind Chinese Images?
ACL 2025
HiddenDetect: Detecting Jailbreak Attacks against Multimodal Large Language Models via Monitoring Hidden States
ACL 2025
VideoComp: Advancing Fine-Grained Compositional and Temporal Alignment in Video-Text Models
CVPR 2025
<
1
…
11
12
13
…
59
>