Breast Cancer Prediction Using Machine Learning
Breast cancer affects the majority of women worldwide, and it is the second most common cause of death among
women. However, if cancer is detected early and treated properly, it is possible to be cured of the condition. Early
detection of breast cancer can dramatically improve the prognosis and chances of survival by allowing patients to
receive timely clinical therapy. Furthermore, precise benign tumour classification can help patients avoid
unneeded treatment. This paper study uses Convolution Neural Networks for Image dataset and K-Nearest
Neighbour (KNN), Decision Tree (CART), Support Vector Machine (SVM), and Naïve Bayes for numerical
dataset, whose features are obtained from digitised image of breast mass, as to forecast and analyse cancer
databases in order to improve accuracy. The dataset will be analysed, evaluated, and model is trained as part of
the process. Finally, both image and numerical test data will be used for prediction.
Keywords: IDC (Invasive Ductal Carcinoma), FNA (Fine Needle Aspirate) , Breast cancer prediction,
Classifier algorithms, CNN (Convolutional neural network).
The “gold-standard” method for detecting cancer Radiology professionals frequently struggle with
previously consisted of three parts: clinical evaluation, mammography mass lesion labelling, which can lead to
radiological imaging, and pathology testing. [18]. The unneeded and costly breast biopsies. The paper's
proposed technique indicates the presence of cancer implementation was evaluated using three publicly
based on regression while new algorithms are available. available benchmark datasets: the DDMS, INbreast,
Model which has been designed for prediction of new and BCDR databases for training and testing, and the
data and should give good result in their training and MIAS dataset for testing only. The results showed that
testing phase [19]. Here there are 3 main steps pre- when PCNN is paired with CNN, it outperforms other
processing features, extraction and classification. approaches for the same publicly available datasets.[1]
Figure 1 shows the types of breast cancers, in this paper
If the mammographic breast tissue is dense, the
we consider IDC.
federal law requires patient notice because increasing
sensitivity us a sign of breast cancer risk and can impair
sensitivity of mammography. Our goal was to get our
deep learning model externally validated using
radiologist breast density evaluations in a community
breast imaging practise.
Figure 2 Data_Matrix_Representation
is derived from the anticipated and actual values. 3.5.2. K-Nearest Neighbour
In the field of machine learning, KNN is a supervised
machine learning algorithm. It is a classifier algorithm
that is used to classify data. It is used to calculate the
distances between points in the data, and subsequently
votes are used to make a judgement.[30-32]
Figure 7 Plot of accuracy comparison Figure 10 Plot of training and validation loss
Thus, SVM is considered for further analysis and To facilitate the ease of interface the GUI is
prediction. The train data is used to model and the test developed using Flask framework to connect the front
data is passed as input for prediction. end to the back-end model to process and provide
Figure 8 is providing the screenshot of accuracy
obtained by using SVM. Medical practitioners can enter input values
manually using patient records and on submission the
record is classified as malignant or benign. Also, image
can be uploaded which then will be process by the
model built and the prediction is made.
Figure 8 Prediction accuracy using SVM for test data
Both authors have contributed equally to the work.
Figure 14 Numerical data record prediction:
Table 1: Comparison of Accuracies obtained by We thank all the faculties and friends for their
different authors expertise and assistance throughout all aspects of our
Author Dataset Method Accurac study and for their help in writing the manuscript.
1.Proposed Numerical SVM 96.48% REFERENCES
methodolog dataset CART 91.88%
