Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Core AI
Artificial Intelligence
›
Core AI
›
Multi-Modal Learning
1457 directly classified papers
Papers per year
2011: 1
2013: 4
2014: 3
2015: 3
2016: 9
2017: 11
2018: 27
2019: 61
2020: 109
2021: 87
2022: 153
2023: 213
2024: 391
2025: 384
2026: 1
Papers
Event Stream-based Visual Object Tracking: A High-Resolution Benchmark Dataset and A Novel Baseline
CVPR 2024
GROUNDHOG: Grounding Large Language Models to Holistic Segmentation
CVPR 2024
iKUN: Speak to Trackers without Retraining
CVPR 2024
WebVLN: Vision-and-Language Navigation on Websites
AAAI 2024
Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language
CVPR 2024
Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Compositional Understanding
CVPR 2024
Weakly-Supervised Audio-Visual Video Parsing with Prototype-based Pseudo-Labeling
CVPR 2024
DAMRO: Dive into the Attention Mechanism of LVLM to Reduce Object Hallucination
EMNLP 2024
LayoutFormer: Hierarchical Text Detection Towards Scene Text Understanding
CVPR 2024
VIXEN: Visual Text Comparison Network for Image Difference Captioning
AAAI 2024
UniHuman: A Unified Model For Editing Human Images in the Wild
CVPR 2024
Analyzing Key Factors Influencing Emotion Prediction Performance of VLLMs in Conversational Contexts
EMNLP 2024
Image Safeguarding: Reasoning with Conditional Vision Language Model and Obfuscating Unsafe Content Counterfactually
AAAI 2024
SEER: Backdoor Detection for Vision-Language Models through Searching Target Text and Image Trigger Jointly
AAAI 2024
Bring Event into RGB and LiDAR: Hierarchical Visual-Motion Fusion for Scene Flow
CVPR 2024
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
EMNLP 2024
Discovering Syntactic Interaction Clues for Human-Object Interaction Detection
CVPR 2024
Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld
CVPR 2024
Inter-X: Towards Versatile Human-Human Interaction Analysis
CVPR 2024
Revisiting motion information for RGB-Event tracking with MOT philosophy
NIPS 2024
Concept-skill Transferability-based Data Selection for Large Vision-Language Models
EMNLP 2024
Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles
CVPR 2024
JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context and Dynamics of Human Interactions Within Social Groups
CVPR 2024
Beyond Embeddings: The Promise of Visual Table in Visual Reasoning
EMNLP 2024
Bridging Modalities: Enhancing Cross-Modality Hate Speech Detection with Few-Shot In-Context Learning
EMNLP 2024
<
1
…
27
28
29
…
59
>