Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Multi-Modal Learning
1457 directly classified papers
Papers per year
2011: 1
2013: 4
2014: 3
2015: 3
2016: 9
2017: 11
2018: 27
2019: 61
2020: 109
2021: 87
2022: 153
2023: 213
2024: 391
2025: 384
2026: 1
Papers
Audio Entailment: Assessing Deductive Reasoning for Audio Understanding
AAAI 2025
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
AAAI 2025
DEQA: Descriptions Enhanced Question-Answering Framework for Multimodal Aspect-Based Sentiment Analysis
AAAI 2025
A Video-grounded Dialogue Dataset and Metric for Event-driven Activities
AAAI 2025
Investigating Language Preference of Multilingual RAG Systems
ACL 2025
Dynamic Syntactic Feature Filtering and Injecting Networks for Cross-lingual Dependency Parsing
AAAI 2025
Multi-modal and Multi-scale Spatial Environment Understanding for Immersive Visual Text-to-Speech
AAAI 2025
Multi-View Empowered Structural Graph Wordification for Language Models
AAAI 2025
Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search Engines
AAAI 2025
DialogDraw: Image Generation and Editing System Based on Multi-Turn Dialogue
AAAI 2025
Tensorized Attention for Understanding Multi-Object Relationships
AAAI 2025
GNS: Solving Plane Geometry Problems by Neural-Symbolic Reasoning with Multi-Modal LLMs
AAAI 2025
GENTEEL-NEGOTIATOR: LLM-Enhanced Mixture-of-Expert-Based Reinforcement Learning Approach for Polite Negotiation Dialogue
AAAI 2025
A New Formula for Sticker Retrieval: Reply with Stickers in Multi-Modal and Multi-Session Conversation
AAAI 2025
RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data
AAAI 2025
Friends-MMC: A Dataset for Multi-modal Multi-party Conversation Understanding
AAAI 2025
SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor
AAAI 2025
UniMuMo: Unified Text, Music, and Motion Generation
AAAI 2025
MSE-Adapter: A Lightweight Plugin Endowing LLMs with the Capability to Perform Multimodal Sentiment Analysis and Emotion Recognition
AAAI 2025
Defeasible Visual Entailment: Benchmark, Evaluator, and Reward-Driven Optimization
AAAI 2025
Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning
AAAI 2025
Practicable Black-Box Evasion Attacks on Link Prediction in Dynamic Graphs—a Graph Sequential Embedding Method
AAAI 2025
Evaluating Image Hallucination in Text-to-Image Generation with Question-Answering
AAAI 2025
Internal Activation Revision: Safeguarding Vision Language Models Without Parameter Update
AAAI 2025
Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices
CVPR 2025
<
1
…
10
11
12
…
59
>