← Learning Types

Deep Learning › Learning Types ›

Multi-Modal Learning

3194 directly classified papers

Papers per year

Papers

Cross-Constrained Progressive Inference for 3D Hand Pose Estimation with Dynamic Observer-Decision-Adjuster Networks AAAI 2024

SC-NeuS: Consistent Neural Surface Reconstruction from Sparse and Noisy Views AAAI 2024

Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-Modal Structured Representations AAAI 2024

TiMix: Text-Aware Image Mixing for Effective Vision-Language Pre-training AAAI 2024

Improving Audio-Visual Segmentation with Bidirectional Generation AAAI 2024

Prompting Multi-Modal Image Segmentation with Semantic Grouping AAAI 2024

Chitranuvad: Adapting Multi-lingual LLMs for Multimodal Translation EMNLP 2024

DCU ADAPT at WMT24: English to Low-resource Multi-Modal Translation Task EMNLP 2024

Improving Panoptic Narrative Grounding by Harnessing Semantic Relationships and Visual Confirmation AAAI 2024

Multilingual Synopses of Movie Narratives: A Dataset for Vision-Language Story Understanding EMNLP 2024

Video Discourse Parsing and Its Application to Multimodal Summarization: A Dataset and Baseline Approaches EMNLP 2024

COMMA: Co-articulated Multi-Modal Learning AAAI 2024

MMAR: Multilingual and Multimodal Anaphora Resolution in Instructional Videos EMNLP 2024

Diversify, Rationalize, and Combine: Ensembling Multiple QA Strategies for Zero-shot Knowledge-based VQA EMNLP 2024

LAMM: Label Alignment for Multi-Modal Prompt Learning AAAI 2024

Improving Hierarchical Text Clustering with LLM-guided Multi-view Cluster Representation EMNLP 2024

Retrieval-enriched zero-shot image classification in low-resource domains EMNLP 2024

SoftCLIP: Softer Cross-Modal Alignment Makes CLIP Stronger AAAI 2024

Investigating Multilingual Instruction-Tuning: Do Polyglot Models Demand for Multilingual Instructions? EMNLP 2024

VIEWS: Entity-Aware News Video Captioning EMNLP 2024

Composite Sketch+Text Queries for Retrieving Objects with Elusive Names and Complex Interactions AAAI 2024

VHASR: A Multimodal Speech Recognition System With Vision Hotwords EMNLP 2024

Kiss up, Kick down: Exploring Behavioral Changes in Multi-modal Large Language Models with Assigned Visual Personas EMNLP 2024

BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions AAAI 2024

CARER - ClinicAl Reasoning-Enhanced Representation for Temporal Health Risk Prediction EMNLP 2024