Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Multimodal Learning
13057 directly classified papers
Papers per year
2003: 1
2006: 3
2007: 6
2008: 2
2009: 5
2010: 2
2011: 3
2012: 6
2013: 24
2014: 20
2015: 46
2016: 109
2017: 205
2018: 299
2019: 622
2020: 675
2021: 987
2022: 1084
2023: 1697
2024: 2500
2025: 3654
2026: 1107
Papers
MSE-Adapter: A Lightweight Plugin Endowing LLMs with the Capability to Perform Multimodal Sentiment Analysis and Emotion Recognition
AAAI 2025
Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded Dialogues
ACL 2025
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
AAAI 2025
Defeasible Visual Entailment: Benchmark, Evaluator, and Reward-Driven Optimization
AAAI 2025
McHirc: A Multimodal Benchmark for Chinese Idiom Reading Comprehension
AAAI 2025
Open-World Attribute Mining for E-Commerce Products with Multimodal Self-Correction Instruction Tuning
ACL 2025
Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback
AAAI 2025
MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
ICCV 2025
SDMatte: Grafting Diffusion Models for Interactive Matting
ICCV 2025
SECodec: Structural Entropy-based Compressive Speech Representation Codec for Speech Language Models
AAAI 2025
Explicitly Guided Difficulty-Controllable Visual Question Generation
AAAI 2025
Prototype-Guided Multimodal Relation Extraction based on Entity Attributes
AAAI 2025
Leveraging Computer Vision and Visual LLMs for Cost-Effective and Consistent Street Food Safety Assessment in Kolkata India
AAAI 2025
Speech Recognition Meets Large Language Model: Benchmarking, Models, and Exploration
AAAI 2025
Language Model Can Listen While Speaking
AAAI 2025
GNS: Solving Plane Geometry Problems by Neural-Symbolic Reasoning with Multi-Modal LLMs
AAAI 2025
Multi-View Empowered Structural Graph Wordification for Language Models
AAAI 2025
Dual Reciprocal Learning of Language-based Human Motion Understanding and Generation
ICCV 2025
Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search Engines
AAAI 2025
Drop the Beat! Freestyler for Accompaniment Conditioned Rapping Voice Generation
AAAI 2025
DEQA: Descriptions Enhanced Question-Answering Framework for Multimodal Aspect-Based Sentiment Analysis
AAAI 2025
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
AAAI 2025
LRM-LLaVA: Overcoming the Modality Gap of Multilingual Large Language-Vision Model for Low-Resource Languages
AAAI 2025
Information Density Principle for MLLM Benchmarks
ICCV 2025
Free-MoRef: Instantly Multiplexing Context Perception Capabilities of Video-MLLMs within Single Inference
ICCV 2025
<
1
…
45
46
47
…
523
>