Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Machine Learning
›
Learning Types
›
Multi-Modal Learning
1213 directly classified papers
Papers per year
2007: 2
2008: 1
2009: 1
2011: 2
2012: 5
2013: 5
2014: 1
2015: 5
2016: 8
2017: 21
2018: 42
2019: 42
2020: 69
2021: 72
2022: 149
2023: 143
2024: 258
2025: 370
2026: 17
Papers
EASUM: Enhancing Affective State Understanding Through Joint Sentiment and Emotion Modeling for Multimodal Tasks
WACV 2024
The Interspeech 2024 TAUKADIAL Challenge: Multilingual Mild Cognitive Impairment Detection with Multimodal Approach
INTERSPEECH 2024
ShapeWalk: Compositional Shape Editing Through Language-Guided Chains
CVPR 2024
A Cross-Attention Layer coupled with Multimodal Fusion Methods for Recognizing Depression from Spontaneous Speech
INTERSPEECH 2024
Multi-Source Domain Adaptation for Object Detection With Prototype-Based Mean Teacher
WACV 2024
HHMR: Holistic Hand Mesh Recovery by Enhancing the Multimodal Controllability of Graph Diffusion Models
CVPR 2024
OmniVec: Learning Robust Representations With Cross Modal Sharing
WACV 2024
SRTube: Video-Language Pre-Training with Action-Centric Video Tube Features and Semantic Role Labeling
CVPR 2024
HoloVIC: Large-scale Dataset and Benchmark for Multi-Sensor Holographic Intersection and Vehicle-Infrastructure Cooperative
CVPR 2024
Can CLIP Help Sound Source Localization?
WACV 2024
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
CVPR 2024
Language Models as Black-Box Optimizers for Vision-Language Models
CVPR 2024
Audio-text Retrieval with Transformer-based Hierarchical Alignment and Disentangled Cross-modal Representation
INTERSPEECH 2024
Tackling Data Bias in MUSIC-AVQA: Crafting a Balanced Dataset for Unbiased Question-Answering
WACV 2024
MIVC: Multiple Instance Visual Component for Visual-Language Models
WACV 2024
LLMs are Good Action Recognizers
CVPR 2024
Discovering Syntactic Interaction Clues for Human-Object Interaction Detection
CVPR 2024
OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning
CVPR 2024
Question Aware Vision Transformer for Multimodal Reasoning
CVPR 2024
Improving Vision-and-Language Reasoning via Spatial Relations Modeling
WACV 2024
InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models
CVPR 2024
CAMOT: Camera Angle-Aware Multi-Object Tracking
WACV 2024
The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective
CVPR 2024
Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection
CVPR 2024
Investigating the Role of Attribute Context in Vision-Language Models for Object Recognition and Detection
WACV 2024
<
1
…
23
24
25
…
49
>