Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Multi-Modal Learning
1457 directly classified papers
Papers per year
2011: 1
2013: 4
2014: 3
2015: 3
2016: 9
2017: 11
2018: 27
2019: 61
2020: 109
2021: 87
2022: 153
2023: 213
2024: 391
2025: 384
2026: 1
Papers
MMToM-QA: Multimodal Theory of Mind Question Answering
ACL 2024
Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction
ACL 2024
Multi-modal Preference Alignment Remedies Degradation of Visual Instruction Tuning on Language Models
ACL 2024
Assessing News Thumbnail Representativeness: Counterfactual text can enhance the cross-modal matching ability
ACL 2024
Generating Human Motion in 3D Scenes from Text Descriptions
CVPR 2024
Evaluating Very Long-Term Conversational Memory of LLM Agents
ACL 2024
Learning to Decode Collaboratively with Multiple Language Models
ACL 2024
Visual Hallucinations of Multi-modal Large Language Models
ACL 2024
XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception
ACL 2024
ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models
ACL 2024
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
ACL 2024
Visualizing Dialogues: Enhancing Image Selection through Dialogue Understanding with Large Language Models
ACL 2024
JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation
CVPR 2024
On the Robustness of Large Multimodal Models Against Image Adversarial Attacks
CVPR 2024
Can't Make an Omelette Without Breaking Some Eggs: Plausible Action Anticipation Using Large Video-Language Models
CVPR 2024
Diving Deep into the Motion Representation of Video-Text Models
ACL 2024
Mask Grounding for Referring Image Segmentation
CVPR 2024
Beyond Text: Unveiling Multimodal Proficiency of Large Language Models with MultiAPI Benchmark
ACL 2024
QDFormer: Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition
CVPR 2024
NTO3D: Neural Target Object 3D Reconstruction with Segment Anything
CVPR 2024
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
CVPR 2024
A Vision Check-up for Language Models
CVPR 2024
Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations
CVPR 2024
DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback
CVPR 2024
BOTH2Hands: Inferring 3D Hands from Both Text Prompts and Body Dynamics
CVPR 2024
<
1
…
18
19
20
…
59
>