Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Multi-Modal Learning
1457 directly classified papers
Papers per year
2011: 1
2013: 4
2014: 3
2015: 3
2016: 9
2017: 11
2018: 27
2019: 61
2020: 109
2021: 87
2022: 153
2023: 213
2024: 391
2025: 384
2026: 1
Papers
SocialCounterfactuals: Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples
CVPR 2024
Can't Make an Omelette Without Breaking Some Eggs: Plausible Action Anticipation Using Large Video-Language Models
CVPR 2024
Active Open-Vocabulary Recognition: Let Intelligent Moving Mitigate CLIP Limitations
CVPR 2024
A Vision Check-up for Language Models
CVPR 2024
BOTH2Hands: Inferring 3D Hands from Both Text Prompts and Body Dynamics
CVPR 2024
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
CVPR 2024
Can I Trust Your Answer? Visually Grounded Video Question Answering
CVPR 2024
Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected Multi-Modal Large Models
CVPR 2024
Generation of Visual Representations for Multi-Modal Mathematical Knowledge
AAAI 2024
ExACT: Language-guided Conceptual Reasoning and Uncertainty Estimation for Event-based Action Recognition and More
CVPR 2024
Visual Language – Let the Product Say What You Want
AAAI 2024
DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback
CVPR 2024
Detours for Navigating Instructional Videos
CVPR 2024
Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models
CVPR 2024
Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
CVPR 2024
CCEdit: Creative and Controllable Video Editing via Diffusion Models
CVPR 2024
Tools Identification By On-Board Adaptation of Vision-and-Language Models
AAAI 2024
Language Models as Black-Box Optimizers for Vision-Language Models
CVPR 2024
Universal Segmentation at Arbitrary Granularity with Language Instruction
CVPR 2024
A Hybrid AI Framework for Sensor-Based Personal Health Monitoring towards Precision Health
AAAI 2024
VidLA: Video-Language Alignment at Scale
CVPR 2024
Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
CVPR 2024
Neural Sign Actors: A Diffusion Model for 3D Sign Language Production from Text
CVPR 2024
ETDPC: A Multimodality Framework for Classifying Pages in Electronic Theses and Dissertations
AAAI 2024
Semantics-aware Motion Retargeting with Vision-Language Models
CVPR 2024
<
1
…
19
20
21
…
59
>