conftrace_
2026 MIDL MIDL 2026

X-Cardia: Phenotype-Guided Cross-Modal Alignment for Opportunistic Cardiac Screening on Routine Chest CT

Abstract

comment Deep learning models for cardiac prognostics often operate within single-modality frameworks, limiting their ability to capture physiologically meaningful cross-modal relationships. In particular, we focus on non-gated, non-contrast chest computed tomography (CT) scans that are typically acquired for entirely non-cardiac indications, rather than for dedicated cardiac assessment. We introduce X-Cardia, a phenotype-guided multimodal alignment framework that transfers structural cardiac phenotypes from echocardiography (ECHO) and electrocardiography (ECG) into CT representations by enforcing explicit phenotype-level consistency. This setting is intrinsically challenging because the lack of cardiac gating obfuscates the cardiac phase and the absence of contrast limits the visibility of cardiovascular structures, but these scans represent a rich resource for opportunistic cardiac screening. The approach combines CLIP-style contrastive pre-training to align image and tabular embeddings with a non-parametric Nadaraya–Watson phenotype head, which uses a support-bank to guide the latent space toward clinically meaningful axes. This enables the image encoder to learn physiological features that are generalized beyond the modality boundaries. We pre-train using data from 20,574 patients and fine-tune the resulting image encoder on ten cardiac abnormality prediction tasks. The proposed method consistently outperforms both the standard contrastive learning and the baseline without pre-training, achieving a gain of up to 8% of AUROC on the test set. In the 5-shot setting, phenotype-guided alignment improves AUROC by an average of 9.8% over baselines, demonstrating strong data efficiency and generalization from few labeled samples. Our results show that explicit phenotype-guided alignment yields interpretable, data-efficient representations that transfer cardiac knowledge to non-cardiac CTs, defining a promising paradigm for multimodal medical imaging. comment Multimodal m