Prostate Cancer Yolo
Prostate Cancer Yolo
Prostate Cancer Yolo
Keywords: Purpose: Developing an artificial intelligence-based prostate cancer detection and diagnosis system that can
Deep learning automatically determine important regions and accurately classify the determined regions on an input biopsy
Gleason grading image.
Prostate cancer detection
Method: The Yolo general-purpose object detection algorithm was utilized to detect important regions (for the
Prostate tissue classification
localization task) and to grade the detected regions (for the classification task). The algorithm was re-trained
with our prostate cancer dataset. The dataset was created by annotating 500 real prostate tissue biopsy images.
The dataset was split into train/test parts as 450/50 real prostate tissue images, respectively, before the data
augmentation process. Next, the training set consisting of 450 labeled biopsy images was pre-processed with
the data augmentation method. This way, the number of biopsy images in the dataset was increased from 450
to 1776. Then, the algorithm was trained with the dataset and the automatic prostate cancer detection and
diagnosis tool was developed.
Results: The developed tool was tested with two test sets. The first test set contains 50 images that are similar
to the train set. Hence, 97% detection and classification accuracy has been achieved. The second test set
contains 137 completely different real prostate tissue biopsy images, thus, 89% detection accuracy has been
achieved.
Conclusion: In this study, an automatic prostate cancer detection and diagnosis tool was developed. The test
results show that high-accuracy (high-performance) prostate cancer diagnosis tools can be developed using AI
(computer vision) methods such as object detection algorithms. These systems can decrease the inter-observer
variability among pathologists, and help prevent the time delay in the diagnosis phase.
∗ Corresponding author.
E-mail addresses: eminsalman@sakarya.edu.tr (M.E. Salman), gozdec123@gmail.com (G. Çakirsoy Çakar), jahongir.azimjonov1@ogr.sakarya.edu.tr
(J. Azimjonov), mustafakosem@sakarya.edu.tr (M. Kösem), cedim@sakarya.edu.tr (İ.H. Cedimoğlu).
https://doi.org/10.1016/j.eswa.2022.117148
Received 16 June 2021; Received in revised form 7 August 2021; Accepted 29 March 2022
Available online 12 April 2022
0957-4174/© 2022 Elsevier Ltd. All rights reserved.
M.E. Salman et al. Expert Systems With Applications 201 (2022) 117148
2
M.E. Salman et al. Expert Systems With Applications 201 (2022) 117148
For example, the accuracy rate of PCa diagnosis approaches such as applied to the relevant region to find the best regions. After selecting
traditional machine learning, probabilistic, and meta-heuristic, which the best features, support vector machines (SVM) were applied to
have been done with other data sets containing few case examples in classify them according to benign and malignant cancers (Abraham &
the past, is between 50% and 80%. Our system achieved a 97% (Test Nair, 2019; Thangavel & Manavalan, 2014). Sadoughi et al. developed
1) and a 89% (Test 2) overall PCa detection and classification accuracy a system for PCa diagnosis by combining back propagation artificial
at PCa tissue analysis. Test 1 contains prostate tissue images similar neural network features with Particle Swarm Optimization(PSO). In
to the train set, thus, a higher accuracy has been obtained. But, the their study with 360 patients data sets, they achieved 98.33% suc-
system was also tested with Test 2 sets. The Test 2 set contains 137 cess during training. The authors say that they are unable to make
prostate tissue images that are completely different from the training a concrete diagnosis for prostate cancer diagnosis but provide useful
test, and the biopsy images (1960x4032 pixel units) in the Test 2 set information to physicians in evaluating whether a biopsy is necessary
are in different sizes from the biopsy images (3042 × 4032 pixel units) or not (Sadoughi & Ghaderzadeh, 2014).
in the training set. The developed system is illustrated in Fig. 1. Meta-heuristic algorithms have generally been used to determine
whether a biopsy is necessary or not. However, other computational
2. Literature review intelligence algorithms have been deemed more appropriate to examine
and evaluate biopsy tissue. For example, artificial neural networks
The goal of predictive models in medical image processing is to have been widely used to diagnose biopsy images, and artificial neural
develop computer-aided algorithms that can predict future events and networks-based approaches have been shown to better adapt to the PCa
developments related to healthcare systems (Shariat, Kattan, Vickers, detection process than other computational intelligence algorithms.
Karakiewicz, & Scardino, 2009). Therefore, many computer-aided tra- Because, the advantage of the artificial neural network is that it gen-
ditional machine learning or artificial intelligence models have been erally performs well in classifying data that is not included in the
developed to diagnose PCa disease. There are many computational learning process (Cosma, Brown, Archer, Khan, & Graham Pockley,
intelligence approaches such as meta-heuristic optimization algorithms, 2017). Saritas et al. designed an artificial neural network model in
artificial neural networks, support vector machines, fuzzy-based ap- the diagnosis of PCa disease. They used data from 121 patients with a
proaches and deep learning applied for PCa prediction modeling (de definite diagnosis. 70% of the data is divided into training sets and 30%
Oliveira et al., 2013). as test data. Their results achieved a success rate of 94.44% that could
For instance, meta-heuristic approaches such as genetic algorithm, help physicians to make fast and reliable diagnoses (Saritas, Ozkan, &
ant colony, and particle swarm optimization algorithms have been pro- Sert, 2010). Greenblatt et al. integrated neural network and support
posed to diagnose PCa by analyzing ultrasound trans-rectal images or to vector machine (SVM) into the computer-aided diagnostic tool used in
determine whether a biopsy is necessary. Underwood et al. developed a the prediction of Gleason grade PCa using histopathological images.
genetic algorithm-based simulation model. It has been compared with First, the quaternion neural network used for multi-classification was
the previously recommended screening policies (Underwood, Zhang, applied to predict Gleason grade PCa. Later, a dual support vector
Denton, Shah, & Inman, 2012). Shi et al. proposed a new approach machine (SVM) was used to make better classification. They achieved
to segment prostate ultrasound images using the genetic optimization a 98.87% success rate in the study they conducted to estimate the
algorithm. In the first step, the boundary curve representation was Gleason degree (Grade 3, 4, 5) in a data set consisting of 71 patient
obtained using principal component analysis (PCA). Next, the prostate images (Greenblatt, Mosquera-Lopez, & Agaian, 2013). Support vector
margin features were determined, and in the third stage, the genetic machines are a method used for classifying linear and nonlinear data.
algorithm was applied to optimize the implicit curve display param- As an example, Haq et al. have developed a dynamic contrast-enhanced
eters (Shi, Liu, & Wu, 2013). Thangavel et al. applied ant colony MRI (DCE-MRI) system for prostate cancer prediction (Jensen, Carl,
optimization to identify cancer from trans-rectal ultrasound (TRUS) Boesen, Langkilde, & Østergaard, 2019; Vente, Vos, Hosseinzadeh,
images in the PCa classification system. First, the relevant regions were Pluim, & Veta, 2021). Various approaches have been used for data-
identified from the TRUS images and ant colony optimization was driven and feature extraction. Using the extracted features, a prediction
3
M.E. Salman et al. Expert Systems With Applications 201 (2022) 117148
Fig. 3. Annotated images. The pathologist annotated vital regions and graded the regions using the Gleason grading system.
model based on SVM classification algorithm was created. 449 tissue and classification systems is under 60% and it is not sufficient and
regions were obtained from 16 patients to test the model. A 86% confident to use them for PCa diagnosis. The fact that these studies have
diagnosis accuracy was achieved in the study (Haq et al., 2015). different and low accuracy results is due to the difference of the model
In probabilistic approaches, methods based on artificial neural net- content, few amounts of images in the data set, and training process.
works, or SVM-based algorithms, the data belonging to the selected For instance, ImageNet, VGG-16, Inception-V3 architectures does
object region either belong 100% to a set or not at all. This is against the not contain object or region localization feature. It has direct and
way a pathologist examines and diagnoses prostate tissue (López-Úbed negative effect on the accuracy detection and classification accuracy.
et al., 2020). However, in fuzzy logic-based approaches, many studies Methods such as ResNet-50, DenseNet-121, MobileNet SSD, R-CNN, FR-
have shown that fuzzy logic-based diagnostics algorithms can examine CNN does not scanning whole input image, contrarily, they only focus
and analyze prostate tissue pattern more closely to human assessment, on randomly selected and determined number of regions. Consequently,
as data belong partially to one cluster and can be incorporated into these methods may miss crucial regions or might focus on unimportant
another cluster to a certain extent simultaneously. For example, in regions and these two conditions lead to misdetection or misclassifica-
a study conducted with fuzzy-based systems, they developed a fuzzy tion, or they completely fail the detection system. However, the Yolo
system combining clinical stage and biopsy Gleason score to predict object detection algorithm detects and classifies objects and detected
the pathological stage of PCa and showed that the developed system regions by scanning the whole parts of an input image. Due to these
performed better than Partin probability tables (Castanho, Hernandes, reasons, it is powerful for object and region detection, and classification
De Ré, Rautenberg, & Billis, 2013). In another study, function values tasks. To address aforementioned PCa detection challenges we have
generated from 190 patient data were determined using ROC curve test chosen general-purpose Yolo object detection algorithm. Additionally,
performance analysis to predict PCa diagnosis. It has been observed to the study is different from other studies is that the Yolo method is a
perform similarly to the presenters presented in probability tables (de new method and there is no study conducted with Yolo to date in the
Paula Castanho, de Barros, Yamakami, & Vendite, 2008). Benecchi diagnosis of prostate cancer disease. As it is stated above, the main
predicted PCa disease using the coactive neuro-fuzzy inference system reason we use the Yolo method in this study is that this algorithm
(CANFIS) in his study. They found that their study with 1030 patient determines the location of the objects more accurately, by analyzing
data and CANFIS was superior to a prostate-specific antigen blood the entire input image, and it has a higher classification and higher
test (Benecchi, 2006). Cosma et al. developed a neuro-fuzzy model for confidence level in detecting objects. In addition, the Yolo algorithm is
the classification and prediction of organ-confined disease (OCD) and superior to other algorithms with its ability to detect real-time object
extra prostatic disease (ED) using the PCa data set obtained from the
detection much faster (Redmon & Farhadi, 2018). Another specificity
cancer genome atlas (TCGA). In the developed model, the age parame-
of this study is that there are nearly 2000 real PCa tissue pattern images
ter was taken as an additional entry in the test results. The estimation
in the data set. The fact that the data set contains sufficient amount of
result obtained from the model revealed a better performance than
tissue samples has significantly increased the accuracy of the developed
other computational intelligence systems (Cosma et al., 2016).
system. For example, the accuracy rate of PCa diagnosis approaches
Traditional machine learning, statistical and probabilistic or fuzzy
such as traditional machine learning, probabilistic, and meta-heuristic,
logic-based algorithms encounter problems such as inability to de-
which have been done with data sets containing few case examples in
tect rapid color or shape change, brightness or lighting problems,
the past, is between 50% and 60%. Our data set used in this study
and illusion or occlusion in object detection and classification pro-
contains patterns created from more than 1780 real human prostate
cesses (Covic et al., 2005). However, the CNN architecture, which
tissues and has an object detection rate of over 90%.
consists of three main layers, namely the input layer, convolution
layers and the fully connected classification layer, and powerful ac-
tivation and error functions in these layers, is robust against to the 3. Methodology
problems that the traditional machine learning, statistical and prob-
abilistic algorithms faced in the past. In addition, CNN architectures 3.1. Feasibility of vision-based PCa diagnosis using Gleason grading system
have been found to perform better in PCa diagnosis from pathologi-
cal images (Arvaniti et al., 2018). Apart from that, there have been Generally, the diagnosis and treatment of PCa relies on the
many CNN-based object detection and classification algorithms in- histopathological analysis of prostate tissue samples and grading them
cluding ImageNet (Russakovsky et al., 2015), VGG-16 (Simonyan & with a Gleason score based on architectural pattern of the tumor. The
Zisserman, 2015), Inception-V3 (Szegedy, Vanhoucke, Ioffe, Shlens, & Gleason grade, Gleason score and ISUP grade grouping systems are
Wojna, 2015), ResNet-50 (He, Zhang, Ren, & Sun, 2015), DenseNet- illustrated via Fig. 1, Bulten et al. (2020). Benign, grade 3, grade
121 (Huang, Liu, Van Der Maaten, & Weinberger, 2017), MobileNet 4, and grade 5 tissue patterns have distinctive pixel, shape features
SSD (Howard et al., 2017), R-CNN, FR-CNN (Girshick, 2015) and that are crucial for the training of deep learning-based PCa detection
Yolo (Redmon, Divvala, Girshick, & Farhadi, 2016). And they are and classification algorithms. These distinctive features enable auto-
actively being used for object detection and classification tasks. There matic and real-time PCa detection and diagnosis systems. Especially, a
are many studies conducted using CNN architecture for the diagnosis of deep learning-based general-purpose object detection algorithm Yolo is
PCa disease. The average accuracy of the previous CNN-based detection fitting to these tasks since it has strong region localization functionality.
4
M.E. Salman et al. Expert Systems With Applications 201 (2022) 117148
3.2. Data set preparation Yolo does not immediately recognize benign or malignant prostate tis-
sue images due to the current hyperparameter settings and pre-learned
This step is critical for the system since the proposed algorithm is features acquired from the COCO dataset. Therefore, a Yolo-based
developed using a fully supervised learning approach of deep learning PCa detection system requires re-training and fine-tuning the general-
(Convolutional Neural Networks). The more accurately the training purpose model of Yolo to detect the important regions and to classify
data set is prepared by annotating images, the higher the accuracy the detected regions of prostate biopsy image as benign, grade 3, 4,
of the whole cancer detection and classification system. Hence, we and 5. Firstly, the Yolo model was fine-tuned. Next, the fine-tuned
have collected more than 500 prostate tissues biopsy images from model was re-trained with the PCa dataset. The Yolo model comes with
different patients. The biopsy images were acquired from the Pathology the neural network configuration file, called yolov3.cfg in which the
Department of the Research and Training Hospital of Sakarya Univer- structure of the Yolo neural net is located. The file consists of totally
sity. The RGB images of prostate tissues were obtained by scanning 106 layers (input, hidden, pooling, and fully connected output layers).
the slides using a digital camera (Samsung, Node9, Wide-angle Super The linear ReLU, Leaky ReLU activation functions have been used in the
Speed Dual Pixel 12MP AF sensor, Rear camera with 12-megapixel, general-purpose model. But, the features of prostate biopsy images in
1.4-micron) attached to a microscope (Nicon eclipse i80). These images the PCa dataset are not linearly distributed. Hence, the linear activation
were scanned at 20× optical magnification. Whole slide tissue images of functions were changed to non-linear activation functions (Sigmoid and
500 biopsy cores, which containing different grades of prostatic acinar Tanh). The Sigmoid and Tanh activation functions are illustrated in
adenocarcinoma and benign areas were stained with H&E compounds. Eqs. (1) and (2), respectively.
Region of interest (ROIs) of grade 3, grade 4, and grade 5 were accu- 𝑒𝑥
rately annotated on whole slides images. The original size of images in 𝑓 (𝑥) = 𝑆𝑖𝑔𝑚𝑜𝑖𝑑(𝑥) = (1)
𝑒𝑥 + 1
training set and test set 1 are 4032 × 3024 (Width 𝑥 Height) pixels.
𝑒𝑥 − 𝑒−𝑥
In the test set 2, the size of images are 4032 × 1960 (Width × Height) 𝑓 (𝑥) = 𝑇 𝑎𝑛ℎ(𝑥) = (2)
𝑒𝑥 + 𝑒−𝑥
pixels. Then, one pathology expert prepared the dataset by annotating
each image into four categories, namely Benign, Grade 3, Grade 4, and The next step to fine-tune the model was to replace the default class
Grade 5. The ‘‘labelImg’’ image annotation tool (by MIT, Massachusetts numbers from 80 to four in which the four corresponds to the Gleason
Institute of Technology) have been utilized to annotate the biopsy grades including benign, grade 3, 4, and 5, and the default number
tissue images in the dataset. The annotated samples by pathologists of 80 corresponds to the previous classes from the Microsoft COCO
are illustrated via Fig. 3. The prepared ground truth data set images dataset. With the newly defined class size, each convolutional layer’s
were prepared by a pathologist, and reviewed and approved by at least filter must also shift from the default 255 to 27. The hyperparameter
two other independent pathologists. The data set has been replicated by ‘filters’ is calculated through this equation, filters=(classes+5)x3 based
passing through a two-step pre-processing stage including fine-tuning on the formula in the original paper of the Yolo algorithm. So, the num-
and data augmentation (replication). The augmentation stage includes ber of filters, in our case is 27, filters=(4+5)x3. Other hyperparameters
horizontal and vertical flips, rotations of (90, 180, 270 degrees), crop were initialized to their optimal values based on the recommendations
(by the annotated bounding-box ratio, for example, 30%, 50% of an in the original paper of the Yolo algorithm as follows:
original image), scale up and down, and merged images of cropped • batch_size=24, subdivisions=8
samples (according to cancer grades) from different images. Image • width=608, height=608, channels=3
examples from the data augmentation step are shown in Fig. 2. At the
• momentum=0.9, decay=0.0005, angle=0
end of this process, we had the data set contains about 1800 images,
• saturation = 1.5, exposure = 1.5, hue=0.1
and it is ready to be forwarded to the main module (object detection
• learning_rate=0.001, burn_in=1000
and classification). The dataset was split into training set and test set
• max_batches = 500200, policy=steps
1 after data augmentation, hence, the training set and test set 1 might
• steps=400000,450000, scales=0.1,0.1
contain similar or the same images. Therefore, we have prepared, test
set 2, which consists of completely distinct biopsy images from different The fully connected and output layers are linked to each other
patients. And the size of the images in test set 2 is also different from with SoftMax (Logistic Regression) loss functions (see, Eq. (3)). Loss
the images in the training set (Tables 2–4). Function contain two stages, the localization loss to predict four points
of bounding boxes and to estimate the conditional labels and compute
3.3. The proposed cancer detection and classification system via re-training the confidence level of the detected provisional object (region)s with
the fine-tuned Yolo algorithm probabilities. These two parts are calculated using sum of squared error
functions. Two scale parameters are used to control how much increase
The general-purpose weight model of Yolo has initially been trained is required for the loss from bounding box coordinate predictions
via the general-purpose architecture of Yolo (Fig. 4) on the Microsoft (𝜆𝑐𝑜𝑜𝑟𝑑 ) and how much decrease is required for the loss of confidence
COCO dataset to detect 80 everyday object classes from cars, phones, or score predictions for boxes without objects (𝜆𝑛𝑜𝑜𝑏𝑗 ). Down-weighting
planes to pens, books, or handbags. However, any pre-trained model of the loss contributed by background boxes is important as most of the
5
M.E. Salman et al. Expert Systems With Applications 201 (2022) 117148
bounding boxes involve no instance. For the study, the model sets
𝜆𝑐𝑜𝑜𝑟𝑑 =5 and 𝜆𝑛𝑜𝑜𝑏𝑗 =0.5. The loss function only penalizes classification Fig. 6. Training Loss Function by 500 Epoches.
error if an object is present in that grid cell, 1𝑜𝑏𝑗𝑖 = 1. It also only
penalizes bounding box coordinate error if that predictor is responsible Table 1
for the ground truth box, 1𝑜𝑏𝑗
𝑖𝑗 = 1. As a one-stage object detector, Yolo
Accuracy results of the system for the test set 1 that contains 174 images, which are
similar to the images in the train set.
is super fast, it produces sufficient results at recognizing regular and
Gleason grades System Three pathologists Number of detected cases
irregularly shaped objects or a group of small objects despite it has a accuracy decision in test image set
limited number of bounding box candidates (Weng, 2018). Benign 96.00 100 74
Grade 3 98.00 100 132
𝑆2 ∑
∑ 𝐵
𝑜𝑏𝑗 Grade 4 95.00 100 224
𝐿𝑙𝑜𝑐 = 𝜆𝑐𝑜𝑜𝑟𝑑 𝑙𝑖𝑗 ̂𝑖 )2 + (𝑦𝑖 − 𝑦̂𝑖 )2 +
[(𝑥𝑖 − 𝑥 Grade 5 100.0 100 162
𝑖=0 𝑗=0
√ √ √ Total/Average 97.00 100 592
√ Accuracy
+( 𝑤𝑖 − 𝑤̂𝑖 )2 + ( ℎ𝑖 − ℎ𝑖 )2 ]
̂
2
∑
𝑆 ∑
𝐵
( 𝑜𝑏𝑗 ) (3)
𝐿𝑐𝑙𝑠 = 1𝑖𝑗 + 𝜆𝑛𝑜𝑜𝑏𝑗 (1 − 1𝑜𝑏𝑗 ̂ 2
𝑖𝑗 ) (𝐶𝑖𝑗 − 𝐶𝑖𝑗 ) +
Table 2
𝑖=0 𝑗=0 Accuracy results of the system for the test set 2 that contains 137 completely different
2
biopsy images.
∑
𝑆 ∑
Gleason grades System Three pathologists Number of detected cases
+ 1𝑜𝑏𝑗 ̂𝑖 (𝑐))2
𝑖 (𝑝𝑖 (𝑐) − 𝑝 accuracy decision in test image set
𝑖=0 𝑐∈𝐶
Benign 71 100 59
𝐿 = 𝐿𝑙𝑜𝑐 + 𝐿𝑐𝑙𝑠
Grade 3 93 100 464
Grade 4 89 100 222
where, 1𝑜𝑏𝑗 𝑖 : An indicator function of whether the cell i contains an Grade 5 84 100 137
object. 1𝑜𝑏𝑗
𝑖 : It indicates whether the 𝑗th bounding box of the cell i is Total/Average 89 100 882
‘‘responsible’’ for the object prediction. 𝐶𝑖𝑗 : The confidence score of cell Accuracy
i, 𝑃 𝑟(𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔𝑎𝑛𝑜𝑏𝑗𝑒𝑐𝑡) ∗ 𝐼𝑜𝑈 (𝑝𝑟𝑒𝑑, 𝑡𝑟𝑢𝑡ℎ), IoU > 0.5 for this study, see
Fig. 5. 𝐶 ̂𝑖𝑗 : The predicted confidence score. 𝐶: The set of all classes.
𝑝𝑖 (𝑐): The conditional probability of whether cell i contains an object systems, it is shown in Fig. 1. There are nine IF-THEN rules, Eq. (5).
of class 𝑐 ∈ 𝐶. 𝑝̂𝑖 (𝑐): The predicted conditional class probability.
⎧1, if 𝐺1 = 3 𝑎𝑛𝑑 𝐺2 = 3
⎪
𝑇𝑃 + 𝑇𝑁 ⎪2, if 𝐺1 = 3 𝑎𝑛𝑑 𝐺2 = 4
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁 ⎪3, if 𝐺1 = 4 𝑎𝑛𝑑 𝐺2 = 3
𝑇𝑃 ⎪
𝑅𝑒𝑐𝑎𝑙𝑙 = ⎪4, if 𝐺1 = 4 𝑎𝑛𝑑 𝐺2 = 4
𝑇𝑃 + 𝐹𝑁 (4) ⎪
𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑃
, ⎪4, if 𝐺1 = 3 𝑎𝑛𝑑 𝐺2 = 5
𝑇𝑃 + 𝐹𝑃 𝐼𝑆𝑈 𝑃 (𝐺1, 𝐺2) = ⎨ (5)
2 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑅𝑒𝑐𝑎𝑙𝑙 ⎪4, if 𝐺1 = 5 𝑎𝑛𝑑 𝐺2 = 3
𝐹𝑠𝑐𝑜𝑟𝑒 = . ⎪5, if 𝐺1 = 4 𝑎𝑛𝑑 𝐺2 = 5
𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙 ⎪
⎪5, if 𝐺1 = 5 𝑎𝑛𝑑 𝐺2 = 4
⎪
⎪5, if 𝐺1 = 5 𝑎𝑛𝑑 𝐺2 = 5
⎪
3.4. Data fusion at decision level ⎩𝐵𝑒𝑛𝑖𝑔𝑛, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒.
6
M.E. Salman et al. Expert Systems With Applications 201 (2022) 117148
Table 3
Confusion matrix table for the test set 1. The test set 1 contains the
biopsy images, which are similar to the biopsy images in the training
set.
Precision Recall F1-score Support
Benign 0.96 0.99 0.97 74
Grade 3 0.98 0.98 0.98 132
Grade 4 0.95 0.99 0.97 224
Grade 5 1.00 0.93 0.96 162
Accuracy 0.97 592
Table 4
Confusion matrix table for the test set 2. The test set 2 contains
completely different 137 biopsy images than the biopsy images in the
training set.
Precision Recall F1-score Support
Benign 0.71 0.88 0.79 59
Grade 3 0.93 0.99 0.96 464
Grade 4 0.89 0.73 0.80 222
Grade 5 0.84 0.80 0.82 137
Accuracy 0.89 882
7
M.E. Salman et al. Expert Systems With Applications 201 (2022) 117148
5. Conclusion
8
M.E. Salman et al. Expert Systems With Applications 201 (2022) 117148
Alom, M. Z., Aspiras, T., Taha, T. M., Asari, V. K., Bowen, T., Billiter, D., et Kweldam, C. F., Wildhagen, M. F., Steyerberg, E. W., Bangma, C. H., van der Kwast, T.
al. (2019). Advanced deep convolutional neural network approaches for digital H., & van Leenders, G. J. (2015). Cribriform growth is highly predictive for
pathology image analysis: a comprehensive evaluation with different use cases. postoperative metastasis and disease-specific death in gleason score 7 prostate
arXiv:1904.09075. cancer. Modern Pathology, [ISSN: 1530-0285] 28(3), 457–464. http://dx.doi.org/
Arvaniti, E., Fricker, K. S., Moret, M., Rupp, N., Hermanns, T., Fankhauser, C., et al. 10.1038/modpathol.2014.116.
(2018). Automated gleason grading of prostate cancer tissue microarrays via deep Lee, H., & Chen, Y.-P. P. (2015). Image based computer aided diagnosis system
learning. Scientific Reports, [ISSN: 2045-2322] 8(1), 12054. http://dx.doi.org/10. for cancer detection. Expert Systems with Applications, [ISSN: 0957-4174] 42(12),
1038/s41598-018-30535-1. 5356–5365. http://dx.doi.org/10.1016/j.eswa.2015.02.005.
Benecchi, L. (2006). Neuro-fuzzy system for prostate cancer diagnosis. Urology, 68(2), Li, W., Li, J., Sarma, K. V., Ho, K. C., Shen, S., Knudsen, B. S., et al. (2019). Path
357–361. R-CNN for prostate cancer diagnosis and gleason grading of histological images.
Bulten, W., Pinckaers, H., van Boven, H., Vink, R., de Bel, T., van Ginneken, B., et IEEE Transactions on Medical Imaging, 38(4), 945–954. http://dx.doi.org/10.1109/
al. (2020). Automated deep-learning system for Gleason grading of prostate cancer TMI.2018.2875868.
using biopsies: a diagnostic study. The Lancet Oncology, 21(2), 233–241. Litjens, G., Sánchez, C. I., Timofeeva, N., Hermsen, M., Nagtegaal, I., Kovacs, I.,
Castanho, M., Hernandes, F., De Ré, A., Rautenberg, S., & Billis, A. (2013). Fuzzy et al. (2016). Deep learning as a tool for increased accuracy and efficiency
expert system for predicting pathological stage of prostate cancer. Expert Systems of histopathological diagnosis. Scientific Reports, [ISSN: 2045-2322] 6(1), 26286.
with Applications, [ISSN: 0957-4174] 40(2), 466–470. http://dx.doi.org/10.1016/j. http://dx.doi.org/10.1038/srep26286.
eswa.2012.07.046. López-Úbed, P., Díaz-Galiano, M. C., Martín-Noguerol, T., na López, A. U., Martín-
Cinar, M., Engin, M., Engin, E. Z., & Ziya Atesci, Y. (2009). Early prostate cancer Valdivia, M.-T., & Luna, A. (2020). Detection of unexpected findings in radiology
diagnosis by using artificial neural networks and support vector machines. Expert reports: A comparative study of machine learning approaches. Expert Systems with
Systems with Applications, [ISSN: 0957-4174] 36(3, Part 2), 6357–6361. http://dx. Applications, [ISSN: 0957-4174] 160, Article 113647. http://dx.doi.org/10.1016/j.
doi.org/10.1016/j.eswa.2008.08.010. eswa.2020.113647.
Cosma, G., Acampora, G., Brown, D., Rees, R. C., Khan, M., & Pockley, A. G. (2016). Madabhushi, A., & Lee, G. (2016). Image analysis and machine learning in digital
Prediction of pathological stage in patients with prostate cancer: A neuro-fuzzy pathology: Challenges and opportunities. Medical Image Analysis, [ISSN: 1361-8415]
model. PLoS One, [ISSN: 1932-6203] 11(6), Article e0155856–e0155856. http: 33, 170–175. http://dx.doi.org/10.1016/j.media.2016.06.037, 20th anniversary of
//dx.doi.org/10.1371/journal.pone.0155856, 27258119[pmid]. the Medical Image Analysis journal (MedIA).
Cosma, G., Brown, D., Archer, M., Khan, M., & Graham Pockley, A. (2017). A survey on Nagpal, K., Foote, D., Liu, Y., Chen, P.-H. C., Wulczyn, E., Tan, F., et al. (2019).
computational intelligence approaches for predictive modeling in prostate cancer. Development and validation of a deep learning algorithm for improving Gleason
Expert Systems with Applications, [ISSN: 0957-4174] 70, 1–19. http://dx.doi.org/10. scoring of prostate cancer. Npj Digital Medicine, [ISSN: 2398-6352] 2(1), 48. http:
1016/j.eswa.2016.11.006. //dx.doi.org/10.1038/s41746-019-0112-2.
Covic, A., Schiller, A., Volovat, C., Gluhovschi, G., Gusbeth-Tatomir, P., Petrica, L., Nezhad, M. Z., Sadati, N., Yang, K., & Zhu, D. (2019). A Deep Active Survival Analysis
et al. (2005). Epidemiology of renal disease in Romania: a 10 year review of approach for precision treatment recommendations: Application of prostate cancer.
two regional renal biopsy databases. Nephrology Dialysis Transplantation, [ISSN: Expert Systems with Applications, [ISSN: 0957-4174] 115, 16–26. http://dx.doi.org/
0931-0509] 21(2), 419–424. http://dx.doi.org/10.1093/ndt/gfi207. 10.1016/j.eswa.2018.07.070.
de Oliveira, D. L. L., do Nascimento, M. Z., Neves, L. A., de Godoy, M. F., de Nir, G., Hor, S., Karimi, D., Fazli, L., Skinnider, B. F., Tavassoli, P., et al. (2018).
Arruda, P. F. F., & de Santi Neto, D. (2013). Unsupervised segmentation method Automatic grading of prostate cancer in digitized histopathology images: Learning
for cuboidal cell nuclei in histological prostate images based on minimum cross from multiple experts. Medical Image Analysis, [ISSN: 1361-8415] 50, 167–180.
entropy. Expert Systems with Applications, [ISSN: 0957-4174] 40(18), 7331–7340. http://dx.doi.org/10.1016/j.media.2018.09.005.
http://dx.doi.org/10.1016/j.eswa.2013.06.079.
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified,
de Paula Castanho, M. J., de Barros, L. C., Yamakami, A., & Vendite, L. L. (2008). Fuzzy real-time object detection. In 2016 IEEE conference on computer vision and pattern
expert system: An example in prostate cancer. Applied Mathematics and Computation, recognition (CVPR) (pp. 779–788). http://dx.doi.org/10.1109/CVPR.2016.91.
[ISSN: 0096-3003] 202(1), 78–85. http://dx.doi.org/10.1016/j.amc.2007.11.055.
Redmon, J., & Farhadi, A. (2018). YOLOV3: An incremental improvement. arXiv:
Egevad, L., Ahmad, A., Algaba, F., Berney, D., Boccon-Gibod, L., Compérat, E., et
1804.02767.
al. (2013). Standardization of gleason grading among 337 European pathologists.
Ribeiro, M. G., Neves, L. A., do Nascimento, M. Z., Roberto, G. F., Martins, A. S.,
Histopathology, 62, 247–256. http://dx.doi.org/10.1111/his.12008.
& Azevedo Tosta, T. A. (2019). Classification of colorectal cancer based on the
Girshick, R. (2015). Fast R-CNN. In 2015 IEEE international conference on computer vision
association of multidimensional and multiresolution features. Expert Systems with
(ICCV) (pp. 1440–1448). http://dx.doi.org/10.1109/ICCV.2015.169.
Applications, [ISSN: 0957-4174] 120, 262–278. http://dx.doi.org/10.1016/j.eswa.
Gleason, D. F. (1966). Automatic grading of prostate cancer in digitized histopathology
2018.11.034.
images: Learning from multiple experts. Medical Image Analysis, 50(3), 125–128.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015).
Greenblatt, A., Mosquera-Lopez, C., & Agaian, S. (2013). Quaternion neural networks
ImageNet Large scale visual recognition challenge. International Journal of Computer
applied to prostate cancer gleason grading. In 2013 IEEE international conference
Vision, [ISSN: 1573-1405] 115(3), 211–252. http://dx.doi.org/10.1007/s11263-
on systems, man, and cybernetics (pp. 1144–1149). http://dx.doi.org/10.1109/SMC.
015-0816-y.
2013.199.
Sadoughi, F., & Ghaderzadeh, M. (2014). A hybrid particle swarm and neural network
Haq, N. F., Kozlowski, P., Jones, E. C., Chang, S. D., Goldenberg, S. L., & Moradi, M.
approach for detection of prostate cancer from benign hyperplasia of prostate.
(2015). A data-driven approach to prostate cancer detection from dynamic contrast
Studies in Health Technology and Informatics, 205, 481.
enhanced MRI. Computerized Medical Imaging and Graphics, [ISSN: 0895-6111] 41,
Saritas, I., Ozkan, I. A., & Sert, I. U. (2010). Prognosis of prostate cancer by
37–45. http://dx.doi.org/10.1016/j.compmedimag.2014.06.017, Machine Learning
artificial neural networks. Expert Systems with Applications, [ISSN: 0957-4174] 37(9),
in Medical Imaging.
6646–6650. http://dx.doi.org/10.1016/j.eswa.2010.03.056.
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep residual learning for image
recognition. arXiv:1512.03385. Shariat, S. F., Kattan, M. W., Vickers, A. J., Karakiewicz, P. I., & Scardino, P. T.
(2009). Critical review of prostate cancer predictive tools. Future Oncology (London,
Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., et
England), [ISSN: 1744-8301] 5(10), 1555–1584. http://dx.doi.org/10.2217/fon.09.
al. (2017). MobileNets: Efficient convolutional neural networks for mobile vision
121, 20001796[pmid].
applications. arXiv:1704.04861.
Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected Shi, Y., Liu, Y., & Wu, P. (2013). Level set priors based approach to the segmentation
convolutional networks. In 2017 IEEE conference on computer vision and pattern of prostate ultrasound image using genetic algorithm. Intelligent Automation & Soft
recognition (CVPR) (pp. 2261–2269). http://dx.doi.org/10.1109/CVPR.2017.243. Computing, 19(4), 537–544. http://dx.doi.org/10.1080/10798587.2013.869111.
Iczkowski, K. A., Torkko, K. C., Kotnis, G. R., Storey Wilson, R., Huang, W., Wheeler, T. Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale
M., et al. (2011). Digital quantification of five high-grade prostate cancer patterns, image recognition. arXiv:1409.1556.
including the cribriform pattern, and their association with adverse outcome. Ström, P., Kartasalo, K., Olsson, H., Solorzano, L., Delahunt, B., Berney, D. M., et
American Journal of Clinical Pathology, [ISSN: 0002-9173] 136(1), 98–107. http: al. (2020). Artificial intelligence for diagnosis and grading of prostate cancer in
//dx.doi.org/10.1309/AJCPZ7WBU9YXSJPE. biopsies: a population-based, diagnostic study. The Lancet Oncology, 21(2), 222–232.
Jensen, C., Carl, J., Boesen, L., Langkilde, N., & Østergaard, L. (2019). Assessment Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2015). Rethinking the
of prostate cancer prognostic Gleason grade group using zonal-specific features inception architecture for computer vision. arXiv:1512.00567.
extracted from biparametric MRI using a KNN classifier. Journal of Applied Clinical Thangavel, K., & Manavalan, R. (2014). Soft computing models based feature selection
Medical Physics, [ISSN: 1526-9914] 20(2), 146–153. http://dx.doi.org/10.1002/ for TRUS prostate cancer image classification. Soft Computing, [ISSN: 1433-7479]
acm2.12542. 18(6), 1165–1176. http://dx.doi.org/10.1007/s00500-013-1135-2.
Kallen, H., Molin, J., Heyden, A., Lundström, C., & Åström, K. (2016). Towards grading Underwood, D. J., Zhang, J., Denton, B. T., Shah, N. D., & Inman, B. A. (2012).
gleason score using generically trained deep convolutional neural networks. In 2016 Simulation optimization of PSA-threshold based prostate cancer screening policies.
IEEE 13th international symposium on biomedical imaging (ISBI) (pp. 1163–1167). Health Care Management Science, [ISSN: 1572-9389] 15(4), 293–309. http://dx.doi.
http://dx.doi.org/10.1109/ISBI.2016.7493473. org/10.1007/s10729-012-9195-x.
9
M.E. Salman et al. Expert Systems With Applications 201 (2022) 117148
Vente, C. d., Vos, P., Hosseinzadeh, M., Pluim, J., & Veta, M. (2021). Deep learning Weng, L. (2018). Object detection part 4: Fast detection models. https:
regression for prostate cancer detection and grading in bi-parametric MRI. IEEE //lilianweng.github.io/lil-log/2018/12/27/object-detection-part-4.html. (Accessed:
Transactions on Biomedical Engineering, 68(2), 374–383. http://dx.doi.org/10.1109/ 2020-10-30).
TBME.2020.2993528. Zhou, N., Fedorov, A., Fennessy, F., Kikinis, R., & Gao, Y. (2017). Large scale digital
Wang, H., Roa, A. C., Basavanhally, A. N., Gilmore, H. L., Shih, N., Feldman, M., prostate pathology image analysis combining feature extraction and deep neural
et al. (2014). Mitosis detection in breast cancer pathology images by combining network. arXiv:1705.02678.
handcrafted and convolutional neural network features. Journal of Medical Imaging,
1(3), 1–8. http://dx.doi.org/10.1117/1.JMI.1.3.034003.
10