Computer Vision › Core AI ›

Multimodal Learning

1257 directly classified papers

Papers per year

Papers

Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models NIPS 2024

Extended Multimodal Hate Speech Event Detection During Russia-Ukraine Crisis - Shared Task at CASE 2024 EACL 2024

Continual Vision-Language Retrieval via Dynamic Knowledge Rectification AAAI 2024

DenoiseRep: Denoising Model for Representation Learning NIPS 2024

Slicing Vision Transformer for Flexible Inference NIPS 2024

PLIP: Language-Image Pre-training for Person Representation Learning NIPS 2024

Rethinking Reverse Distillation for Multi-Modal Anomaly Detection AAAI 2024

CLIP in Mirror: Disentangling text from visual images through reflection NIPS 2024

ViLCo-Bench: VIdeo Language COntinual learning Benchmark NIPS 2024

Sketch-Based Video Object Localization WACV 2024

Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers CVPR 2024

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions NIPS 2024

SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery CVPR 2024

Unsegment Anything by Simulating Deformation CVPR 2024

MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric CVPR 2024

Learning 1D Causal Visual Representation with De-focus Attention Networks NIPS 2024

SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction CVPR 2024

Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval CVPR 2024

AHIVE: Anatomy-aware Hierarchical Vision Encoding for Interactive Radiology Report Retrieval CVPR 2024

Visual Fourier Prompt Tuning NIPS 2024

Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation CVPR 2024

Text-DiFuse: An Interactive Multi-Modal Image Fusion Framework based on Text-modulated Diffusion Model NIPS 2024

WhodunitBench: Evaluating Large Multimodal Agents via Murder Mystery Games NIPS 2024

Suppress Content Shift: Better Diffusion Features via Off-the-Shelf Generation Techniques NIPS 2024

Enhancing Feature Diversity Boosts Channel-Adaptive Vision Transformers NIPS 2024