CrossFusion: A Multi-Scale Cross-Attention Convolutional Fusion Model for Cancer Survival Prediction
Abstract
Cancer survival prediction from whole slide images (WSIs) relies on capturing prognostic features spanning multiple magnifications, from global tissue architecture to fine-grained cellular morphology. However, current approaches typically face two main limitations: most frameworks focus heavily on single-scale analysis, thereby overlooking the hierarchical context of tissue; meanwhile, existing multi-scale methods often employ simplistic fusion mechanisms (e.g., direct concatenation) that fail to model effective cross-scale interactions. To address these challenges, we propose CrossFusion, a novel multi-scale architecture that introduces a convolutional fusion processor to perform rigorous scale–space integration. Evaluated on six TCGA cancer cohorts, CrossFusion achieves state-of-the-art C-index performance, consistently outperforming both strong single-scale and multi-scale baselines. Furthermore, leveraging domain-specific pathology feature extractors yields additional gains in prognostic accuracy compared to general-purpose backbones.