Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Multi-Modal Learning
1457 directly classified papers
Papers per year
2011: 1
2013: 4
2014: 3
2015: 3
2016: 9
2017: 11
2018: 27
2019: 61
2020: 109
2021: 87
2022: 153
2023: 213
2024: 391
2025: 384
2026: 1
Papers
CL2CM: Improving Cross-Lingual Cross-Modal Retrieval via Cross-Lingual Knowledge Transfer
AAAI 2024
Semantics-aware Motion Retargeting with Vision-Language Models
CVPR 2024
Neural Sign Actors: A Diffusion Model for 3D Sign Language Production from Text
CVPR 2024
Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model
CVPR 2024
Can't Make an Omelette Without Breaking Some Eggs: Plausible Action Anticipation Using Large Video-Language Models
CVPR 2024
Transferable Video Moment Localization by Moment-Guided Query Prompting
AAAI 2024
Revisiting Counterfactual Problems in Referring Expression Comprehension
CVPR 2024
CCEdit: Creative and Controllable Video Editing via Diffusion Models
CVPR 2024
Image as a Language: Revisiting Scene Text Recognition via Balanced, Unified and Synchronized Vision-Language Reasoning Network
AAAI 2024
Towards Better Vision-Inspired Vision-Language Models
CVPR 2024
DiVAS: Video and Audio Synchronization with Dynamic Frame Rates
CVPR 2024
RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation
CVPR 2024
MuLTI: Efficient Video-and-Language Understanding with Text-Guided MultiWay-Sampler and Multiple Choice Modeling
AAAI 2024
ADA-Track: End-to-End Multi-Camera 3D Multi-Object Tracking with Alternating Detection and Association
CVPR 2024
Learning Group Activity Features Through Person Attribute Prediction
CVPR 2024
Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use
CVPR 2024
Weakly Supervised Multimodal Affordance Grounding for Egocentric Images
AAAI 2024
Vision-and-Language Navigation via Causal Learning
CVPR 2024
AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation
CVPR 2024
CoDi-2: In-Context Interleaved and Interactive Any-to-Any Generation
CVPR 2024
Mask Grounding for Referring Image Segmentation
CVPR 2024
GLaMM: Pixel Grounding Large Multimodal Model
CVPR 2024
SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-Form Layout-to-Image Generation
AAAI 2024
CLIB-FIQA: Face Image Quality Assessment with Confidence Calibration
CVPR 2024
On the Robustness of Large Multimodal Models Against Image Adversarial Attacks
CVPR 2024
<
1
…
26
27
28
…
59
>