← Learning Types

Deep Learning › Learning Types ›

Multi-Modal Learning

3194 directly classified papers

Papers per year

Papers

Beyond the Label Itself: Latent Labels Enhance Semi-supervised Point Cloud Panoptic Segmentation AAAI 2024

FashionERN: Enhance-and-Refine Network for Composed Fashion Image Retrieval AAAI 2024

ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation CVPR 2024

LAMM: Label Alignment for Multi-Modal Prompt Learning AAAI 2024

Improving Panoptic Narrative Grounding by Harnessing Semantic Relationships and Visual Confirmation AAAI 2024

Joint Demosaicing and Denoising for Spike Camera AAAI 2024

Have We Ever Encountered This Before? Retrieving Out-of-Distribution Road Obstacles From Driving Scenes WACV 2024

X4D-SceneFormer: Enhanced Scene Understanding on 4D Point Cloud Videos through Cross-Modal Knowledge Transfer AAAI 2024

Everything2Motion: Synchronizing Diverse Inputs via a Unified Framework for Human Motion Synthesis AAAI 2024

Annotation-Free Audio-Visual Segmentation WACV 2024

See Say and Segment: Teaching LMMs to Overcome False Premises CVPR 2024

Improving Audio-Visual Segmentation with Bidirectional Generation AAAI 2024

Object Attribute Matters in Visual Question Answering AAAI 2024

VideoGrounding-DINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding CVPR 2024

Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models CVPR 2024

Text-conditional Attribute Alignment across Latent Spaces for 3D Controllable Face Image Synthesis CVPR 2024

Local-Global Multi-Modal Distillation for Weakly-Supervised Temporal Video Grounding AAAI 2024

Open-Vocabulary Video Relation Extraction AAAI 2024

PAIR Diffusion: A Comprehensive Multimodal Object-Level Image Editor CVPR 2024

Bi-directional Adapter for Multimodal Tracking AAAI 2024

Cross-Constrained Progressive Inference for 3D Hand Pose Estimation with Dynamic Observer-Decision-Adjuster Networks AAAI 2024

Perceiving Longer Sequences With Bi-Directional Cross-Attention Transformers NIPS 2024

Selectively Answering Visual Questions ACL 2024

LiT: Unifying LiDAR "Languages" with LiDAR Translator NIPS 2024

Stitching Segments and Sentences towards Generalization in Video-Text Pre-training AAAI 2024