← Learning Types

Deep Learning › Learning Types ›

Multi-Modal Learning

3194 directly classified papers

Papers per year

Papers

Super-Class Guided Transformer for Zero-Shot Attribute Classification AAAI 2025

Retrieval-Augmented Dynamic Prompt Tuning for Incomplete Multimodal Learning AAAI 2025

Utilizing Vision-Language Models for Detection of Leaf-Based Diseases in Tomatoes AAAI 2025

ADIEE: Automatic Dataset Creation and Scorer for Instruction-Guided Image Editing Evaluation ICCV 2025

Large Multi-modal Models Can Interpret Features in Large Multi-modal Models ICCV 2025

Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data and Metric Perspectives ICCV 2025

Flow4Agent: Long-form Video Understanding via Motion Prior from Optical Flow ICCV 2025

AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning ICCV 2025

VCRMNER: Visual Cue Refinement in Multimodal NER using CLIP Prompts COLING 2025

PanSt3R: Multi-view Consistent Panoptic Segmentation ICCV 2025

4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models CVPR 2025

G2SF: Geometry-Guided Score Fusion for Multimodal Industrial Anomaly Detection ICCV 2025

BottleHumor: Self-Informed Humor Explanation using the Information Bottleneck Principle ACL 2025

SA-Occ: Satellite-Assisted 3D Occupancy Prediction in Real World ICCV 2025

Octopus: Alleviating Hallucination via Dynamic Contrastive Decoding CVPR 2025

CMT: A Cascade MAR with Topology Predictor for Multimodal Conditional CAD Generation ICCV 2025

Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment CVPR 2025

Derm1M: A Million-scale Vision-Language Dataset Aligned with Clinical Ontology Knowledge for Dermatology ICCV 2025

CTYUN-AI at SemEval-2025 Task 1: Learning to Rank for Idiomatic Expressions SEMEVAL 2025

DADM: Dual Alignment of Domain and Modality for Face Anti-spoofing ICCV 2025

Identifying and Mitigating Position Bias of Multi-image Vision-Language Models CVPR 2025

ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance ICCV 2025

Joint Vision-Language Social Bias Removal for CLIP CVPR 2025

ReMP-AD: Retrieval-enhanced Multi-modal Prompt Fusion for Few-Shot Industrial Visual Anomaly Detection ICCV 2025

What Is That Talk About? A Video-to-Text Summarization Dataset for Scientific Presentations ACL 2025