conftrace_
2025 MIDL MIDL 2025

PixelCAM: Pixel Class Activation Mapping for Histology Image Classification and ROI Localization

Abstract

Weakly supervised object localization (WSOL) methods allow training models to classifyimages and localize ROIs. WSOL only requires low-cost image-class annotations yet provides a visually interpretable classifier, which is important in histology image analysis.Standard WSOL methods rely on class activation mapping (CAM) methods to producespatial localization maps according to a single- or two-step strategy. While both strategies have made significant progress, they still face several limitations with histology images. Single-step methods can easily result in under- or over-activation due to the limitedvisual ROI saliency in histology images and scarce localization cues. They also face thewell-known issue of asynchronous convergence between classification and localization tasks.The two-step approach is sub-optimal because it is constrained to a frozen classifier, limiting the capacity for localization. Moreover, these methods also struggle when appliedto out-of-distribution (OOD) datasets. In this paper, a multi-task approach for WSOLis introduced for simultaneous training of both tasks to address the asynchronous convergence problem. In particular, localization is performed in the pixel-feature space of animage encoder that is shared with classification. This allows learning discriminant featuresand accurate delineation of foreground/background regions to support ROI localizationand image classification. We propose PixelCAM, a cost-effective foreground/backgroundpixel-wise classifier in the pixel-feature space that allows for spatial object localization.Using partial-cross entropy, PixelCAM is trained using pixel pseudo-labels collected from apretrained WSOL model. Both image and pixel-wise classifiers are trained simultaneouslyusing standard gradient descent. In addition, our pixel classifier can easily be integratedinto CNN- and transformer-based architectures without any modifications. Our extensiveexperiments1 on GlaS and CAMELYON16 cancer datasets show that Pix