Breast Cancer Prediction Using Machine Learning
Breast Cancer Prediction Using Machine Learning
Breast Cancer Prediction Using Machine Learning
ABSTRACT
Breast cancer affects the majority of women worldwide, and it is the second most common cause of death among
women. However, if cancer is detected early and treated properly, it is possible to be cured of the condition. Early
detection of breast cancer can dramatically improve the prognosis and chances of survival by allowing patients to
receive timely clinical therapy. Furthermore, precise benign tumour classification can help patients avoid
unneeded treatment. This paper study uses Convolution Neural Networks for Image dataset and K-Nearest
Neighbour (KNN), Decision Tree (CART), Support Vector Machine (SVM), and Naïve Bayes for numerical
dataset, whose features are obtained from digitised image of breast mass, as to forecast and analyse cancer
databases in order to improve accuracy. The dataset will be analysed, evaluated, and model is trained as part of
the process. Finally, both image and numerical test data will be used for prediction.
Keywords: IDC (Invasive Ductal Carcinoma), FNA (Fine Needle Aspirate) , Breast cancer prediction,
Classifier algorithms, CNN (Convolutional neural network).
The “gold-standard” method for detecting cancer Radiology professionals frequently struggle with
previously consisted of three parts: clinical evaluation, mammography mass lesion labelling, which can lead to
radiological imaging, and pathology testing. [18]. The unneeded and costly breast biopsies. The paper's
proposed technique indicates the presence of cancer implementation was evaluated using three publicly
based on regression while new algorithms are available. available benchmark datasets: the DDMS, INbreast,
Model which has been designed for prediction of new and BCDR databases for training and testing, and the
data and should give good result in their training and MIAS dataset for testing only. The results showed that
testing phase [19]. Here there are 3 main steps pre- when PCNN is paired with CNN, it outperforms other
processing features, extraction and classification. approaches for the same publicly available datasets.[1]
Figure 1 shows the types of breast cancers, in this paper
If the mammographic breast tissue is dense, the
we consider IDC.
federal law requires patient notice because increasing
sensitivity us a sign of breast cancer risk and can impair
sensitivity of mammography. Our goal was to get our
deep learning model externally validated using
radiologist breast density evaluations in a community
breast imaging practise.
3. METHODOLGY
349
Atlantis Highlights in Computer Sciences, volume 4
(2)
(3)
(4)
Figure 2 Data_Matrix_Representation
350
Atlantis Highlights in Computer Sciences, volume 4
is derived from the anticipated and actual values. 3.5.2. K-Nearest Neighbour
In the field of machine learning, KNN is a supervised
machine learning algorithm. It is a classifier algorithm
that is used to classify data. It is used to calculate the
distances between points in the data, and subsequently
votes are used to make a judgement.[30-32]
351
Atlantis Highlights in Computer Sciences, volume 4
Figure 7 Plot of accuracy comparison Figure 10 Plot of training and validation loss
Thus, SVM is considered for further analysis and To facilitate the ease of interface the GUI is
prediction. The train data is used to model and the test developed using Flask framework to connect the front
data is passed as input for prediction. end to the back-end model to process and provide
prediction.
Figure 8 is providing the screenshot of accuracy
obtained by using SVM. Medical practitioners can enter input values
manually using patient records and on submission the
record is classified as malignant or benign. Also, image
can be uploaded which then will be process by the
model built and the prediction is made.
Figure 8 Prediction accuracy using SVM for test data
352
Atlantis Highlights in Computer Sciences, volume 4
AUTHORS’ CONTRIBUTIONS
Both authors have contributed equally to the work.
Figure 14 Numerical data record prediction:
Malignant ACKNOWLEDGMENTS
Table 1: Comparison of Accuracies obtained by We thank all the faculties and friends for their
different authors expertise and assistance throughout all aspects of our
Author Dataset Method Accurac study and for their help in writing the manuscript.
y
1.Proposed Numerical SVM 96.48% REFERENCES
methodolog dataset CART 91.88%
y [1] Meteb M. Altaf- A hybrid deep learning model for
NB 93.19%
breast cancer diagnosis based on transfer learning
KNN 95.8%
IDC dataset CNN 98.13%
and pulse-coupled neural networks
2.Wang et Electronic Logistic 96.4 % [2] B. Akbugday, "Classification of Breast Cancer
al. [3] health records regressio Data Using Machine Learning Algorithms," 2019
n Medical Technologies Congress (TIPTEKNO),
Izmir, Turkey, 2019, pp. 1-4.
3. Akbugday Breast Cancer KNN 96.85%
[2] Wisconsin SVM 96.85% [3] Wang, D., Khosla, A., Gargeya, R., Irshad, H. &
dataset Beck, A. H. Deep learning for identifying
metastatic breast cancer. arXiv preprint
arXiv:1606.05718 (2016).
353
Atlantis Highlights in Computer Sciences, volume 4
[4] Nazeri, K., Aminpour, A. &Ebrahimi, M. Two- Intrusion Detection in Network Traffic Data.
stage convolutional neural network for breast arXiv preprint arXiv:1709.03082 (2017).
cancer histology image classification. In
[16] Kyunghyun Cho, Bart Van Merriënboer, Caglar
International Conference Image Analysis and
Gulcehre, Dzmitry Bahdanau, Fethi Bougares,
Recognition, 717–726 (Springer, 2018).
HolgerSchwenk, and Yoshua Bengio, (2014),
[5] Golatkar, A., Anand, D. &Sethi, A. Classification “Learning phrase representations using RNN
of breast cancer histology using deep learning. In encoder-decoder for statistical machine
International Conference Image Analysis and translation”, arXiv preprint arXiv:1406.1078
Recognition, 837–844 (Springer, 2018). (2014).
[6] Albarqouni, S. et al. Aggnet: deep learning from [17] C. Cortes and V. Vapnik, (1995), “Support-vector
crowds for mitosis detection in breast cancer Networks.Machine Learning 20.3”, (1995), 273–
histology images. IEEE transactions on medical 297. https://doi.org/10.1007/BF00994018.
imaging 35, 1313–1321 (2016).
[18] Gönen, M.; Alpaydın, E. Multiple kernel learning
[7] B. N. Dontchos, A. Yala, R. Barzilay, J. Xiang, C. algorithms. J. Mach. Learn. Res. 2011, 12, 2211–
D. Lehman, External validation of a deep learning 2268.
model for predicting mammographic breast
[19] Ferroni, P.; Zanzotto, F.M.; Scarpato, N.;
density in routine clinical practice, Acad.
Riondino, S.; Nanni, U.; Roselli, M.; Guadagni, F.
Radiol., 28 (2020), 475-480.
Risk assessment for venous thromboembolism in
[8] Rao, S. Mitos-rcnn: A novel approach to mitotic chemotherapy treated ambulatory cancer patients:
figure detection in breast cancer histopathology A precision medicine approach. Med. Dec. Mak.
images using region based convolutional neural 2017, 37, 234–242.
networks. arXiv preprint arXiv:1807.01788
[20] Ferroni, P.; Roselli, M.; Zanzotto, F.M.; Guadagni,
(2018).
F. Artificial Intelligence for cancer-associated
[9]. Bejnordi, B. E. et al. Diagnostic assessment of thrombosis risk assessment. Lancet
deep learning algorithms for detection of lymph Haematol.2018, 5, e391.
node metastases in women with breast cancer.
[21] Cristianini, N.; Shawe-Taylor, J. An Introduction
Jama 318, 2199–2210 (2017).
to Support Vector Machines and other kernel-
[10] Bándi, P. et al. From detection of individual based learning methods.Ai Magazine 2000, 22,
metastases to classification of lymph node status 190.
at the patient level: the camelyon17 challenge.
[22] Matyas, J. Random optimization. Automat. Rem.
IEEE Transactions on Med. Imaging (2018).
Control 1965, 26, 246–253.
[11] Litjens, G. et al. 1399 h&e-stained sentinel lymph
[23] Jain A, Levy D. 2016. Breast mass classification
node sections of breast cancer patients: the
using deep convolutional neural networks. In: 30th
camelyon dataset. GigaScience 7, giy065 (2018).
conference on neural information processing
[12] Aresta, G. et al. Bach: Grand challenge on breast systems (NIPS 2016). Barcelona, Spain. 1_6.
cancer histology images. arXiv preprint
[24] Jiang F. 2017. Breast mass lesion classification in
arXiv:1808.04277 (2018).
mammograms by transfer learning. In: ICBCB '17.
[13] Gouda I Salama, M Abdelhalim, and MagdyAbd- Hong Kong, 59_62 DOI
elghanyZeid. 2012. Breast cancer diagnosis on 10.1145/3035012.3035022.
three different datasets using multiclassifiers.
[25] Ragab DA, Sharkas M, Marshall S, Ren J. 2019.
Breast Cancer (WDBC) 32, 569 (2012), 2.
Breast cancer detection using deep convolutional
[14] William H Wolberg, W Nick Street, and Olvi L neural networks and support vector
Mangasarian. 1992. Breast cancer Wisconsin machines.PeerJ 7:e6201
(diagnostic) data set. UCI Machine Learning http://doi.org/10.7717/peerj.6201.
Repository [http://archive. ics. uci. edu/ml/]
[26] Keles, M. Kaya, "Breast Cancer Prediction and
(1992).
Detection Using Data Mining Classification
[15] Abien Fred Agarap. 2017. A Neural Network Algorithms: A Comparative Study." Tehnicki
Architecture Combining Gated Recurrent Unit Vjesnik - Technical Gazette, vol. 26, no. 1, 2019,
(GRU) and Support Vector Machine (SVM) for p. 149+.
354
Atlantis Highlights in Computer Sciences, volume 4
[27] Z. Guo, L. Tang, T. Guo, K. Yu, M. Alazab, A. communication system." Energies 13, no. 13
Shalaginov, “Deep Graph Neural Network-based (2020): 3466.
Spammer Detection Under the Perspective of
[36] Hu, Liwen, Ngoc-Tu Nguyen, Wenjin Tao, Ming
Heterogeneous Cyberspace”, Future Generation
C. Leu, Xiaoqing Frank Liu, Md Rakib Shahriar,
Computer Systems,
and SM Nahian Al Sunny. "Modeling of cloud-
https://doi.org/10.1016/j.future.2020.11.028.
based digital twins for smart manufacturing with
[28] Y. Sun, J. Liu, K. Yu, M. Alazab, K. Lin, MT connect." Procedia manufacturing 26 (2018):
“PMRSS: Privacy-preserving Medical Record 1193-1203
Searching Scheme for Intelligent Diagnosis in IoT
[37] Seyhan, Kübra, Tu N. Nguyen, Sedat Akleylek,
Healthcare”, IEEE Transactions on Industrial
Korhan Cengiz, and SK Hafızul Islam. "Bi-GISIS
Informatics, doi: 10.1109/TII.2021.3070544.
KE: Modified key exchange protocol with
[29] K. Yu, L. Tan, L. Lin, X. Cheng, Z. Yi and T. Sato, reusable keys for IoT security." Journal of
"Deep-Learning-Empowered Breast Cancer Information Security and Applications 58 (2021):
Auxiliary Diagnosis for 5GB Remote E-Health," 102788.
IEEE Wireless Communications, vol. 28, no. 3,
[38] Pham, Dung V., Giang L. Nguyen, Tu N. Nguyen,
pp. 54-61, June 2021, doi:
Canh V. Pham, and Anh V. Nguyen. "Multi-topic
10.1109/MWC.001.2000374.
misinformation blocking with budget constraint
[30] K. Yu, L. Tan, S. Mumtaz, S. Al-Rubaye, A. Al- on online social networks." IEEE Access 8
Dulaimi, A. K. Bashir, F. A. Khan, “Securing (2020): 78879-78889.
Critical Infrastructures: Deep Learning-based
[39] Arun, M., E. Baraneetharan, A. Kanchana, and S.
Threat Detection in the IIoT”, IEEE
Prabu. "Detection and monitoring of the
Communications Magazine, 2021.
asymptotic COVID-19 patients using IoT devices
[31] K. Yu, Z. Guo, Y. Shen, W. Wang, J. C. Lin, T. and sensors." International Journal of Pervasive
Sato, “Secure Artificial Intelligence of Things for Computing and Communications (2020).
Implicit Group Recommendations”, IEEE Internet
[40] Kumar, M. Keerthi, B. D. Parameshachari, S.
of Things Journal, 2021,
Prabu, and Silvia liberata Ullo. "Comparative
doi: 10.1109/JIOT.2021.3079574.
Analysis to Identify Efficient Technique for
[32] H. Li, K. Yu, B. Liu, C. Feng, Z. Qin and G. Interfacing BCI System." In IOP Conference
Srivastava, "An Efficient Ciphertext-Policy Series: Materials Science and Engineering, vol.
Weighted Attribute-Based Encryption for the 925, no. 1, p. 012062. IOP Publishing, 2020.
Internet of Health Things," IEEE Journal of
Biomedical and Health Informatics, 2021, doi:
10.1109/JBHI.2021.3075995.
[33] Puttamadappa, C., and B. D. Parameshachari.
"Demand side management of small scale loads in
a smart grid using glow-worm swarm optimization
technique." Microprocessors and
Microsystems 71 (2019): 102886.
[34] Rajendran, Ganesh B., Uma M. Kumarasamy,
Chiara Zarro, Parameshachari B. Divakarachari,
and Silvia L. Ullo. "Land-use and land-cover
classification using a human group-based particle
swarm optimization algorithm with an LSTM
Classifier on hybrid pre-processing remote-
sensing images." Remote Sensing 12, no. 24
(2020): 4135.
[35] Subramani, Prabu, Ganesh Babu Rajendran, Jewel
Sengupta, Rocío Pérez de Prado, and
Parameshachari Bidare Divakarachari. "A block
bi-diagonalization-based pre-coding for indoor
multiple-input-multiple-output-visible light
355