Anatomical Pretext Tasks with Hybrid CNN-ViT Backbone for Enhanced SVM-Based Mammogram Analysis
DOI:
https://doi.org/10.65204/djes.v3i1.652Keywords:
Mammogram Analysis, Self-Supervised Learning , Hybrid CNN–Vision Transformer , Anatomical Pretext Tasks , Support Vector Machine (SVM)Abstract
We suggest a new method of enhancing the feature discriminability of mammogram analysis through integrating anatomical pretext tasks in a hybrid CNN-ViT backbone followed by improved classification using SVM. Traditional self-supervised approaches usually use generic image transforms, which does not necessarily reflect the subtle clinical reasoning regarding radiologists. To tackle this, we formulate domain-specific pretext tasks, which directly model anatomical priors, such as spatial context reconstruction, orientation prediction conditional on ductal tree alignment and lesion-context consistency using contrastive learning. The hybrid backbone is a hybrid of ResNet-50 and ViT, which extracts local and global context patterns, respectively, to produce an integrated feature representation that links hierarchical and long-range dependencies. These characteristics are then trimmed down using PCA, so that they can be compatible with SVM kernels whilst retaining anatomical significance. In comparison to the current approaches, our framework is the only one to use the pretraining goals with clinical workflows, thus enhancing the interpretability of features and minimizing the need of large labeled datasets. Experiments indicate that the suggested approach performs better than the conventional deep feature extractors in mammogram classification tasks. Integrating domain-sensitive medical image analysis with self-directed learning through incorporating anatomical reasoning provides new opportunities to analyze medical images, especially in cases when few annotations are available. The contribution of the paper is the progression of the synergy between radiologist-inspired feature learning and up-to-date deep architectures, which gives a scalable algorithm that enhances the accuracy of diagnostic outcomes.
References
S. J. S. Gardezi, M. Awais, I. Faye, and M. Hussain, “Mammogram classification using deep learning features,” in Proc. IEEE Int. Conf. Signal and Image Process. Appl. (ICSIPA), Sep. 2017, pp. 485–488, doi: 10.1109/ICSIPA.2017.8120660.
Y. Wu, D. Zeng, Z. Wang, Y. Shi, and J. Hu, “Distributed contrastive learning for medical image segmentation,” Med. Image Anal., vol. 80, p. 102564, May 2022, doi: 10.1016/j.media.2022.102564.
A. Taleb, C. Lippert, T. Klein, and M. Nabi, “Multimodal self-supervised learning for medical image analysis,” arXiv preprint arXiv:1912.05396, 2019.
W. Falcon and K. Cho, “A framework for contrastive self-supervised learning and designing a new approach,” arXiv preprint arXiv:2009.00104, 2020.
X. Meng, H. Yu, J. Fan, J. Mu, H. Chen, J. Luan, et al., “A self-supervised representation learning paradigm with global content perception and peritumoral context restoration for MRI breast tumor segmentation,” Biomed. Signal Process. Control, vol. 86, p. 107757, 2025, doi: 10.1016/j.bspc.2025.107757.
G. Dai, D. Dai, C. Wang, Q. Tang, et al., “Multi-task learning network for medical image analysis guided by lesion regions and spatial relationships of tissues,” IEEE Trans. Circuits Syst. Video Technol., to appear 2025, doi: 10.1109/TCSVT.2025.3596803.
K. He, C. Gan, Z. Li, I. Rekik, Z. Yin, W. Ji, Y. Gao, Q. Wang, et al., “Transformers in medical image analysis,” Intell. Med., vol. 3, no. 1, pp. 1-15, 2023, doi: 10.1016/j.imed.2022.07.002.
L. Jing, X. Yang, J. Liu, and Y. Tian, “Self-supervised spatiotemporal feature learning via video rotation prediction,” arXiv preprint arXiv:1811.11387, 2018.
C. Wei, L. Xie, X. Ren, Y. Xia, C. Su, J. Liu, et al., “Iterative reorganization with weak spatial constraints: Solving arbitrary jigsaw puzzles for unsupervised representation learning,” arXiv preprint arXiv:1812.00329, 2018, doi: 10.48550/arXiv.1812.00329.
L. Chen, P. Bentley, K. Mori, K. Misawa, M. Fujiwara, et al., “Self-supervised learning for medical image analysis using image context restoration,” Med. Image Anal., vol. 58, p. 101539, 2019, doi:10.1016/j.media.2019.101539.
T. Zhang, D. Wei, M. Zhu, S. Gu, and Y. Zheng, “Self-supervised learning for medical image data with anatomy-oriented imaging planes,” Med. Image Anal., vol. 88, p. 103151, 2024, doi:10.1016/j.media.2024.103151.
J. Kang, R. Fernandez-Beltran, P. Duan, et al., “Deep unsupervised embedding for remotely sensed images based on spatially augmented momentum contrast,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 12, pp. 8741-8754, Dec. 2020, doi: 10.1109/TGRS.2020.3007029.
Z. Cao, Z. Deng, Z. Yang, J. Ma, and L. Ma, “Supervised contrastive pre-training models for mammography screening,” J. Big Data, vol. 13, no. 1, p. 75, 2025, doi: 10.1186/s40537-025-01075-z.
Z. Chen, Q. Gao, Y. Zhang, and H. Shan, “ASCON: Anatomy-aware supervised contrastive learning framework for low-dose CT denoising,” arXiv preprint arXiv:2307.12225, 2023. Available: https://doi.org/10.1007/978-3-031-43999-5_34.
H. Chen and A. L. Martel, “Enhancing breast cancer detection on screening mammogram using self-supervised learning and a hybrid deep model of Swin Transformer and convolutional neural networks” J. Med. Imaging, vol. 12, Suppl. 2, p. S22007, 2025, doi: 10.1117/1.JMI.12.S2.S22007.
A. Zeynali, M. A. Tinati, and B. M. Tazehkand, “Hybrid CNN-Transformer architecture with Xception-based feature enhancement for accurate breast cancer classification,” IEEE Access, vol. 12, pp. 1-12, 2024, doi: 10.1109/ACCESS.2024.3516535.
D. Sun, M. Wang, H. Feng, and A. Li, “Prognosis prediction of human breast cancer by integrating deep neural network and support vector machine,” in Proc. 10th Int. Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), Shanghai, China, Oct. 2017, pp. 1-5, doi: 10.1109/CISP-BMEI.2017.8301908.
M. Lamba, G. Munjal, and Y. Gigras, “Supervising healthcare schemes using machine learning in breast cancer and internet of things (SHSMLIoT),” in Handbook of Research on Blockchain Technology, Academic Press, 2022, ch. 11, pp. 271-294, doi: 10.1002/9781119792468.ch11.
R. Sawyer-Lee, F. Gimenez, A. Hoogi, and D. Rubin, “Curated breast imaging subset of digital database for screening mammography (CBIS-DDSM),” The Cancer Imaging Archive, 2016. Available: https://doi.org/10.7937/k9/tcia.2016.7o02s9cy.
I. C. Moreira, I. Amaral, I. Domingues, A. Cardoso, et al., “Inbreast: toward a full-field digital mammographic database,” Acad. Radiol., vol. 18, no. 10, pp. 1181-1190, Oct. 2011, doi: 10.1016/j.acra.2011.09.014.
S. Wang, D. Li, X. Song, Y. Wei, and H. Li, “A feature selection method based on improved fisher’s discriminant ratio for text sentiment classification,” Expert Syst. Appl., vol. 38, no. 1, pp. 158-162, Jan. 2011, doi: 10.1016/j.eswa.2011.01.077.
X. Chen, H. Fan, R. Girshick, and K. He, “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020. (Note: the link “1409.0575” seems incorrect for this title; correct identifier is 2010.11929).
I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017.
Z. Cao, Z. Deng, Z. Yang, J. Ma, and L. Ma, “Supervised contrastive pre-training models for mammography screening,” J. Big Data, vol. 13, no. 1, p. 75, 2025, doi: 10.1186/s40537-025-01075-z.
A. Dosovitskiy, L. Beyer, A. Kolesnikov, et al., “An image is worth 16×16 words: Transformers for image recognition at scale,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Long Beach, CA, USA, 2016, pp. 766-784, doi: 10.1109/CVPR.2016.90.
G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Honolulu, HI, USA, 2017, pp. 4700-4708, doi: 10.1109/CVPR.2017.243.
X. Chen, H. Fan, R. Girshick, and K. He, “Improved baselines with momentum contrastive learning,” arXiv preprint arXiv:2003.04297, 2020, doi: 10.48550/arXiv.2003.04297.
M. Caron, I. Misra, J. Mairal, P. Goyal, et al., “Unsupervised learning of visual features by contrasting cluster assignments,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), Virtual, 2020, pp. 9912-9924, doi: 10.5555/3495724.3496555.
R. Azad, R. Arimond, E. K. Aghdam, A. Kazerouni, et al., “DAE-Former: Dual attention-guided efficient transformer for medical image segmentation,” arXiv preprint arXiv:2212.13504, 2022.
M. A. Hamza, S. B. H. Hassine, I. Abunadi, et al., “Feature selection with optimal stacked sparse autoencoder for data mining,” Comput. Math. Methods (CMC), vol. 11, no. 4, pp. 1234-1248, 2022, doi: 10.32604/cmc.2022.024764.
M. Gazda, J. Plavka, J. Gazda, and P. Drotar, “Self-supervised deep convolutional neural network for chest X-ray classification,” IEEE Access, vol. 9, pp. 135000-135010, 2021, doi: 10.1109/ACCESS.2021.3125324.
W. Zhang, L. Zhan, P. Thompson, and Y. Wang, “Deep representation learning for multimodal brain networks,” in Proc. Int. Conf. Med. Image Comput. Comput.-Assist. Intervent. (MICCAI), Cham, Switzerland: Springer, 2021, pp. 520-531, doi: 10.1007/978-3-030-59728-3_60.
A. Y. El-Bastawissi, E. White, M. T. Mandelson, et al., “Variation in mammographic breast density by race,” Ann. Epidemiol., vol. 11, no. 2, pp. 161-165, 2001, doi: 10.1016/S1047-2797(00)00225-8.
S. Wang, D. Li, X. Song, Y. Wei, and H. Li, “A feature selection method based on improved fisher’s discriminant ratio for text sentiment classification,” arXiv preprint arXiv:2103.10504, 2021, doi: 10.48550/arXiv.2103.10504.
M. H. U. Rehman, W. Hugo Lopez Pinaya, et al., “Federated learning for medical imaging radiology,” Br. J. Radiol., vol. 95, no. 1136, p. 20220890, 2022, doi: 10.1259/bjr.20220890.
Downloads
Published
Versions
- 2026-03-22 (2)
- 2026-03-22 (1)