← Learning Types

Machine Learning › Learning Types ›

Multi-Modal Learning

1213 directly classified papers

Papers per year

Papers

ELBA: Learning by Asking for Embodied Visual Navigation and Task Completion WACV 2025

DMPT: Decoupled Modality-Aware Prompt Tuning for Multi-Modal Object Re-Identification WACV 2025

ActionDiffusion: An Action-Aware Diffusion Model for Procedure Planning in Instructional Videos WACV 2025

To Ask or Not to Ask? Detecting Absence of Information in Vision and Language Navigation WACV 2025

CM3T: Framework for Efficient Multimodal Learning for Inhomogeneous Interaction Datasets WACV 2025

When and How to Augment Your Input: Question Routing Helps Balance the Accuracy and Efficiency of Large Language Models NAACL 2025

MIRe: Enhancing Multimodal Queries Representation via Fusion-Free Modality Interaction for Multimodal Retrieval ACL 2025

Beyond the Mode: Sequence-Level Distillation of Multilingual Translation Models for Low-Resource Language Pairs NAACL 2025

CityNav: A Large-Scale Dataset for Real-World Aerial Navigation ICCV 2025

From Text to Multi-Modal: Advancing Low-Resource-Language Translation through Synthetic Data Generation and Cross-Modal Alignments NAACL 2025

StuD: A Multimodal Approach for Stuttering Detection with RAG and Fusion Strategies IJCNLP 2025

Mind the Gap: Aligning Vision Foundation Models to Image Feature Matching ICCV 2025

RusCode: Russian Cultural Code Benchmark for Text-to-Image Generation NAACL 2025

CLaMP 3: Universal Music Information Retrieval Across Unaligned Modalities and Unseen Languages ACL 2025

MultiCoPIE: A Multilingual Corpus of Potentially Idiomatic Expressions for Cross-lingual PIE Disambiguation NAACL 2025

HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding? ICCV 2025

Dynamic Group Detection using VLM-augmented Temporal Groupness Graph ICCV 2025

Dynamic Interactive Bimodal Hypergraph Networks for Emotion Recognition in Conversations AAAI 2025

Capturing the Unseen: Vision-Free Facial Motion Capture Using Inertial Measurement Units AAAI 2025

MSR: A Multifaceted Self-Retrieval Framework for Microscopic Cascade Prediction AAAI 2025

Can Large Language Models Classify and Generate Antimicrobial Resistance Genes? ACL 2025

Cross-Modal Distillation for 2D/3D Multi-Object Discovery from 2D Motion CVPR 2025

DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving CVPR 2025

PAI at SemEval-2025 Task 11: A Large Language Model Ensemble Strategy for Text-Based Emotion Detection ACL 2025

Incorporating Dense Knowledge Alignment into Unified Multimodal Representation Models CVPR 2025