Research Explorer
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Papers
Trends
Conferences
Explore
Authors
Topics
Keywords
Achievements
About
Methodology
← Learning Types
Deep Learning
›
Learning Types
›
Multi-Modal Learning
3194 directly classified papers
Papers per year
2003: 1
2010: 1
2011: 1
2013: 5
2014: 3
2015: 9
2016: 23
2017: 49
2018: 78
2019: 158
2020: 223
2021: 261
2022: 354
2023: 471
2024: 705
2025: 835
2026: 17
Papers
CALVIN: Improved Contextual Video Captioning via Instruction Tuning
NIPS 2024
Generating Illustrated Instructions
CVPR 2024
Extending Multi-modal Contrastive Representations
NIPS 2024
Enhancing Multi-View Pedestrian Detection Through Generalized 3D Feature Pulling
WACV 2024
Multilingual Diversity Improves Vision-Language Representations
NIPS 2024
EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models
CVPR 2024
Octopus: A Multi-modal LLM with Parallel Recognition and Sequential Understanding
NIPS 2024
Prompt Highlighter: Interactive Control for Multi-Modal LLMs
CVPR 2024
SC-NeuS: Consistent Neural Surface Reconstruction from Sparse and Noisy Views
AAAI 2024
ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation
CVPR 2024
Mutual-Modality Adversarial Attack with Semantic Perturbation
AAAI 2024
Enhancing Feature Diversity Boosts Channel-Adaptive Vision Transformers
NIPS 2024
WhodunitBench: Evaluating Large Multimodal Agents via Murder Mystery Games
NIPS 2024
Prompting Multi-Modal Image Segmentation with Semantic Grouping
AAAI 2024
HENASY: Learning to Assemble Scene-Entities for Interpretable Egocentric Video-Language Model
NIPS 2024
Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2)
NIPS 2024
SDMTR: A Brain-inspired Transformer for Relation Inference
AISTATS 2024
Language Model Guided Interpretable Video Action Reasoning
CVPR 2024
COMMA: Co-articulated Multi-Modal Learning
AAAI 2024
AFBench: A Large-scale Benchmark for Airfoil Design
NIPS 2024
Customized Multiple Clustering via Multi-Modal Subspace Proxy Learning
NIPS 2024
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
CVPR 2024
RaVL: Discovering and Mitigating Spurious Correlations in Fine-Tuned Vision-Language Models
NIPS 2024
IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos
NIPS 2024
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
AAAI 2024
<
1
…
52
53
54
…
128
>