In the last decade, Convolutional Neural Networks (CNNs) have been the de facto approach for automated medical image detection. Recently, Vision Transformers have emerged in computer vision as an alternative to CNNs. Specifically, the Shifted Window (Swin) Transformer is a general-purpose backbone that learns attention-based hierarchical features and achieves state-of-the-art performances in a variety of vision tasks. In this work, for the first time, we design and experiment transformer-based models for mass detection in digital mammograms leveraging Swin transformer as a backbone multiscale feature extractor. Experiments on the largest publicly available mammography image database OMI-DB yield a True Positive Rate (TPR) of 75.7 % at 0.1 False Positives per Image (FPpI) for the best transformer model, with 2.5 % TPR improvement over its convolutional counterpart and a massive 7.4 % TPR over the state-of-the-art. We also combine transformer- and convolution-based detectors with weighted box fusion, achieving an additional 2.4 % TPR improvement reaching 78.1 % TPR at 0.1 FPpI.

Transformer-based mass detection in digital mammograms

Marrocco C.;Molinara M.;Bria A.
2023-01-01

Abstract

In the last decade, Convolutional Neural Networks (CNNs) have been the de facto approach for automated medical image detection. Recently, Vision Transformers have emerged in computer vision as an alternative to CNNs. Specifically, the Shifted Window (Swin) Transformer is a general-purpose backbone that learns attention-based hierarchical features and achieves state-of-the-art performances in a variety of vision tasks. In this work, for the first time, we design and experiment transformer-based models for mass detection in digital mammograms leveraging Swin transformer as a backbone multiscale feature extractor. Experiments on the largest publicly available mammography image database OMI-DB yield a True Positive Rate (TPR) of 75.7 % at 0.1 False Positives per Image (FPpI) for the best transformer model, with 2.5 % TPR improvement over its convolutional counterpart and a massive 7.4 % TPR over the state-of-the-art. We also combine transformer- and convolution-based detectors with weighted box fusion, achieving an additional 2.4 % TPR improvement reaching 78.1 % TPR at 0.1 FPpI.
File in questo prodotto:
File Dimensione Formato  
84d79392-5314-4bee-8af4-190efcf71227.pdf

non disponibili

Licenza: Copyright dell'editore
Dimensione 3.71 MB
Formato Adobe PDF
3.71 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11580/96503
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 8
social impact