Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Multi-Modal Learning
1457 directly classified papers
Papers per year
2011: 1
2013: 4
2014: 3
2015: 3
2016: 9
2017: 11
2018: 27
2019: 61
2020: 109
2021: 87
2022: 153
2023: 213
2024: 391
2025: 384
2026: 1
Papers
U-VAP: User-specified Visual Appearance Personalization via Decoupled Self Augmentation
CVPR 2024
Collecting High-quality Multi-modal Conversational Search Data for E-Commerce
ACL 2024
GLaMM: Pixel Grounding Large Multimodal Model
CVPR 2024
ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
CVPR 2024
Real-Time Simulated Avatar from Head-Mounted Sensors
CVPR 2024
Discovering Syntactic Interaction Clues for Human-Object Interaction Detection
CVPR 2024
Beyond Text: Unveiling Multimodal Proficiency of Large Language Models with MultiAPI Benchmark
ACL 2024
CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning
IJCAI 2024
Inter-X: Towards Versatile Human-Human Interaction Analysis
CVPR 2024
DiaLoc: An Iterative Approach to Embodied Dialog Localization
CVPR 2024
Semantics-aware Motion Retargeting with Vision-Language Models
CVPR 2024
Visualizing Dialogues: Enhancing Image Selection through Dialogue Understanding with Large Language Models
ACL 2024
Multi-modal Stance Detection: New Datasets and Model
ACL 2024
What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models
CVPR 2024
Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction
ACL 2024
Multi-modal Preference Alignment Remedies Degradation of Visual Instruction Tuning on Language Models
ACL 2024
Evaluating Very Long-Term Conversational Memory of LLM Agents
ACL 2024
Learning to Decode Collaboratively with Multiple Language Models
ACL 2024
XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception
ACL 2024
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
ACL 2024
CLASP: Cross-modal Alignment Using Pre-trained Unimodal Models
ACL 2024
MM-LLMs: Recent Advances in MultiModal Large Language Models
ACL 2024
UniHuman: A Unified Model For Editing Human Images in the Wild
CVPR 2024
DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model
CVPR 2024
WonderJourney: Going from Anywhere to Everywhere
CVPR 2024
<
1
…
16
17
18
…
59
>