cross-modal learning

521 papers

Explore in graph

Also known as

CMP C3HOST

Co-occurring keywords

multimodal learning (4622) contrastive learning (3979) knowledge distillation (3680) representation learning (6174) multi-modal learning (1276) vision-language model (2235) self-supervised learning (3751) domain adaptation (4578) video understanding (1647) zero-shot learning (3637)

Papers

iMoT: Inertial Motion Transformer for Inertial Navigation AAAI 2025

UniDxMD: Towards Unified Representation for Cross-Modal Unsupervised Domain Adaptation in 3D Semantic Segmentation ICCV 2025

CmEAA: Cross-modal Enhancement and Alignment Adapter for Radiology Report Generation COLING 2025

Towards Multilingual spoken Visual Question Answering system using Cross-Attention COLING 2025

Less for More: Enhanced Feedback-aligned Mixed LLMs for Molecule Caption Generation and Fine-Grained NLI Evaluation ACL 2025

WildSAT: Learning Satellite Image Representations from Wildlife Observations ICCV 2025

Electron Density-enhanced Molecular Geometry Learning IJCAI 2025

Towards Cross-Modality Modeling for Time Series Analytics: A Survey in the LLM Era IJCAI 2025

CLIP-driven View-aware Prompt Learning for Unsupervised Vehicle Re-identification AAAI 2025

Fine-Grained Spatial and Verbal Losses for 3D Visual Grounding WACV 2025

Learning Visual Proxy for Compositional Zero-Shot Learning ICCV 2025

Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition ICCV 2025

Cross-modal Collaborative Representation Learning for Text-to-Image Person Retrieval IJCAI 2025

Bidirectional Multi-Step Domain Generalization for Visible-Infrared Person Re-Identification WACV 2025

VILLS : Video-Image Learning to Learn Semantics for Person Re-Identification WACV 2025

GLEAM: Enhanced Transferable Adversarial Attacks for Vision-Language Pre-training Models via Global-Local Transformations ICCV 2025

TokenBinder: Text-Video Retrieval with One-to-Many Alignment Paradigm WACV 2025

Seeking Proxy Point via Stable Feature Space for Noisy Correspondence Learning IJCAI 2025

GlobalDoc: A Cross-Modal Vision-Language Framework for Real-World Document Image Retrieval and Classification WACV 2025

Meta-Learning for Color-to-Infrared Cross-Modal Style Transfer WACV 2025

Cross-Modal Learning for Music-to-Music-Video Description Generation NAACL 2025

SSN_MMHS@DravidianLangTech 2025: A Dual Transformer Approach for Multimodal Hate Speech Detection in Dravidian Languages NAACL 2025

PHGC: Procedural Heterogeneous Graph Completion for Natural Language Task Verification in Egocentric Videos CVPR 2025

HiGarment: Cross-modal Harmony Based Diffusion Model for Flat Sketch to Realistic Garment Image ICCV 2025

DTW-Align: Bridging the Modality Gap in End-to-End Speech Translation with Dynamic Time Warping Alignment EMNLP 2025