← Learning Types

Machine Learning › Learning Types ›

Multi-Modal Learning

1213 directly classified papers

Papers per year

Papers

Boosting Vision-Language Models with Transduction NIPS 2024

Listenable Maps for Zero-Shot Audio Classifiers NIPS 2024

MM-WLAuslan: Multi-View Multi-Modal Word-Level Australian Sign Language Recognition Dataset NIPS 2024

VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks NIPS 2024

LaSe-E2V: Towards Language-guided Semantic-aware Event-to-Video Reconstruction NIPS 2024

MambaTree: Tree Topology is All You Need in State Space Model NIPS 2024

Continual Audio-Visual Sound Separation NIPS 2024

Conjugated Semantic Pool Improves OOD Detection with Pre-trained Vision-Language Models NIPS 2024

CultureLLM: Incorporating Cultural Differences into Large Language Models NIPS 2024

Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms NIPS 2024

Mind's Eye of LLMs: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models NIPS 2024

MECD: Unlocking Multi-Event Causal Discovery in Video Reasoning NIPS 2024

Boosting Weakly Supervised Referring Image Segmentation via Progressive Comprehension NIPS 2024

Omnipotent Distillation with LLMs for Weakly-Supervised Natural Language Video Localization: When Divergence Meets Consistency AAAI 2024

Federated Modality-Specific Encoders and Multimodal Anchors for Personalized Brain Tumor Segmentation AAAI 2024

Learning Multi-Modal Cross-Scale Deformable Transformer Network for Unregistered Hyperspectral Image Super-resolution AAAI 2024

Joint Demosaicing and Denoising for Spike Camera AAAI 2024

Context-I2W: Mapping Images to Context-Dependent Words for Accurate Zero-Shot Composed Image Retrieval AAAI 2024

Unifying Multi-Modal Uncertainty Modeling and Semantic Alignment for Text-to-Image Person Re-identification AAAI 2024

Uncertainty-Aware Yield Prediction with Multimodal Molecular Features AAAI 2024

FT-GAN: Fine-Grained Tune Modeling for Chinese Opera Synthesis AAAI 2024

Transformer-Empowered Multi-Modal Item Embedding for Enhanced Image Search in E-commerce AAAI 2024

TelTrans: Applying Multi-Type Telecom Data to Transportation Evaluation and Prediction via Multifaceted Graph Modeling AAAI 2024

Spatial-Temporal Augmentation for Crime Prediction (Student Abstract) AAAI 2024

Translation Deserves Better: Analyzing Translation Artifacts in Cross-lingual Visual Question Answering ACL 2024