Breast Cacner Detection

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

378

BREAST CANCER DETECTION USING


ML Model
ANSH GOEL(22BPS1061)

SCHOOL OF COMPUTER SCIENCE AND ENGINEERING, VELLORE INSTITUE OF TECHNOLOGY,


CHENNAI 600127, INDIA

Abstract—Breast cancer is one of the most prevalent forms of


can analyze clinical features derived from imaging or
cancer worldwide, and early detection plays a crucial role in histopathological data, providing reliable predictions on
improving survival rates. In this study, we present a machine tumor malignancy.
learning-based web application designed to predict breast cancer
using clinical data. The application leverages a combination of In this research, we developed a web-based application
data preprocessing techniques and supervised learning algorithms
to classify malignant and benign tumors with high accuracy. Built for breast cancer detection, utilizing machine learning
on modern web technologies such as Flutterflow, the platform algorithms to predict the likelihood of malignancy. The
provides an accessible and user-friendly interface for both application integrates the Breast Cancer Wisconsin
healthcare professionals and patients. The machine learning (Diagnostic) Dataset, which contains key diagnostic
model, developed and trained using the Breast Cancer Wisconsin features extracted from breast mass cell nuclei. The
(Diagnostic) Dataset, employs techniques such as feature scaling, project focuses on designing a streamlined, user-friendly
data normalization, and hyperparameter tuning to enhance
predictive performance. The final model achieved a significant
interface for medical professionals and patients, while
level of accuracy, demonstrating the potential of AI-driven embedding a sophisticated machine learning backend.
solutions in assisting medical professionals with early diagnosis. The chosen classification algorithms, such as logistic
regression, support vector machines, and random
This research contributes to the growing body of literature on
the use of machine learning in medical diagnostics by providing
forests, were fine-tuned through extensive
an end-to-end solution that integrates advanced machine learning experimentation and feature engineering to achieve high
models into a deployable web application. Future work will predictive accuracy.
involve expanding the dataset, refining the model’s accuracy, and
exploring its adaptability to other types of cancer or medical
conditions.

I. INTRodUcTIoN

Breast cancer remains one of the leading causes of cancer-


related deaths among women worldwide. Despite significant
advancements in medical research and treatment methods, early
detection continues to be the most critical factor in improving
patient outcomes and survival rates. Traditional diagnostic
methods, such as mammography and biopsies, though effective,
can be invasive, expensive, and require specialized expertise.
Recent developments in artificial intelligence (AI) and machine Fig 1. breast mass cell nuclei
learning (ML) have introduced the potential for non-invasive,
cost-effective diagnostic tools that can assist healthcare
This paper discusses the development of this breast
professionals in detecting malignancies earlier and more
cancer detection system, from data preprocessing and
accurately.
model training to deployment on a web platform. Our
approach emphasizes the integration of explainable AI
Machine learning, with its ability to analyze large techniques, making the predictions interpretable to
datasets and recognize complex patterns, has shown clinicians, thus fostering trust in machine learning tools.
remarkable promise in various medical applications, By providing a readily deployable, accessible solution,
particularly in cancer detection. The automation of the this work seeks to contribute to the global efforts in
diagnostic process through ML algorithms allows for reducing breast cancer mortality through early, reliable,
faster, more consistent results, reducing the risk of and non-invasive diagnostic aids.
human error. In breast cancer diagnosis, ML models

Authorized licensed use limited to: Indian Institute of Information Technology Design & Manufacturing. Downloaded on November 10,2020 at 09:48:02 UTC from IEEE Xplore. Restrictions apply.
379

Fig. 2 WORKING OF MODEL.

II. ReLATed WoRKs


Numerous studies have explored the
application of machine learning (ML) and III. MeTHodoLogY
deep learning in breast cancer detection, In this section, the methodology used for the proposed
significantly improving diagnostic accuracy method is given.
and early intervention. Key works in this
domain include comprehensive reviews of 1. Data Collection and Preprocessing:
deep learning approaches to detect breast
 Input Features: The model uses 30 features
cancer using various imaging modalities related to tumor characteristics, such as radius,
such as mammography, ultrasound, and texture, perimeter, area, smoothness, and
MRI. These studies highlight the symmetry.
effectiveness of convolutional neural  Data Source: The dataset is sourced from
networks (CNNs) in enhancing diagnostic Kaggle (e.g., yasserh/breast-cancer-dataset).
sensitivity and specificity. Another notable  Preprocessing:
area of research focuses on integrating  Load data into a Pandas DataFrame.
genetic data with imaging for more accurate
 Drop irrelevant columns (e.g., "id").
predictions. Below are some related works:
 Encode the target variable ('diagnosis')
1. ”Deep Learning Approaches to Detect
using Label Encoding.
Breast Cancer: A Comprehensive
 Handle missing values
Review"
with SimpleImputer.
 Explores deep learning techniques
across various imaging modalities. 2. Exploratory Data Analysis (EDA):
 Generate descriptive statistics and visualizations
2."Breast Cancer Detection Using (histograms, heatmaps) to understand feature
Artificial Intelligence Techniques: A distributions and correlations.
Review"
 Reviews machine learning, ensemble 3. Feature Scaling and Splitting:
techniques, and deep learning in breast  Standardize features using StandardScaler.
cancer prediction.  Split the dataset into training (80%) and testing
(20%) sets.
3."Breast Cancer Detection Using
Machine Learning in Digital 4. Model Selection and Training:
Mammography"  Evaluate multiple classifiers: Logistic
 Discusses ML algorithms applied to Regression, Decision Tree, Random Forest,
digital mammography for improved SVC, KNN, Naive Bayes, Gradient Boosting,
diagnostic accuracy. and XGBoost.
 Use RandomizedSearchCV for hyperparameter
4."The BCPM Method: Decoding tuning.
Breast Cancer with Machine Learning"
 Introduces a comprehensive machine
learning model for precise breast
cancer diagnosis..
Authorized licensed use limited to: Indian Institute of Information Technology Design & Manufacturing. Downloaded on November 10,2020 at 09:48:02 UTC from IEEE Xplore. Restrictions apply.
380

5. Model Evaluation:  A ROC: 0.92


 Assess models using metrics: c
Accuracy, Precision, Recall, c Random Forest:
F1 Score, and AUC-ROC. u  Best CV Score: 0.96
 Compile results into a r  Accuracy: 95%
DataFrame to identify the best- a  Precision: 95%
performing model. c  Recall: 95%
y  F1-Score: 95%
6. Real-Life Data Collection: :  AUC-ROC: 0.96
 Collect input features through
medical imaging 9
(mammograms, MRIs), biopsy  Support Vector Classifier:
1
results, clinical tests, and  Best CV Score: 0.94
%
electronic health records  Accuracy: 93%
 P
(EHR).  Precision: 93%
r
e  Recall: 93%
Conclusion
c  F1-Score: 93%
This methodology provides a structured  AUC-ROC: 0.94
i
approach to breast cancer detection using
s
machine learning, focusing on data
i K-Nearest Neighbors:
preprocessing, model evaluation, and real-life  Best CV Score: 0.90
data collection strategies. o
n  Accuracy: 89%
:  Precision: 89%
IV. ExpeRIMeNTAL ResULTs  Recall: 89%
9  F1-Score: 89%
This section presents the results obtained from 1  AUC-ROC: 0.90
evaluating various machine learning models on %
the breast cancer dataset. Each model was  R Gaussian Naive Bayes:
assessed using cross-validation and tested on a e  Best CV Score: 0.88
separate test set. The performance metrics c  Accuracy: 87%
include Best Cross-Validation Score, Accuracy, a  Precision: 87%
Precision, Recall, F1-Score, and AUC-ROC. l  Recall: 87%
l
 F1-Score: 87%
1. Model Performance Summary :
 AUC-ROC: 0.88
9
 Gradient Boosting:
2. Detailed Model Analysis 1
%  Best CV Score: 0.95
 Logistic Regression:  Accuracy: 94%
 Best CV Score: 0.95  F
1  Precision: 94%
 Accuracy: 94%
-
 Precision: 94% (True
S
Positives / (True Positives +
c
False Positives))
o
 Recall: 94% (True Positives /
r
(True Positives + False
e
Negatives))
:
 F1-Score: 94% (Harmonic
mean of Precision and Recall) 9
 AUC-ROC: 0.95 (Area Under 1
the Receiver Operating %
Characteristic Curve)  A
U
 Decision Tree: C
 Best CV Score: 0.92 -

Authorized licensed use limited to: Indian Institute of Information Technology Design & Manufacturing. Downloaded on November 10,2020 at 09:48:02 UTC from IEEE Xplore. Restrictions apply.
381

V. CoNcLUsIoN
In this study, we evaluated the performance of several Random Forest could be integrated into
machine learning models for breast cancer detection using a clinical workflows to assist healthcare
comprehensive dataset. The models assessed included professionals in making more accurate
Logistic Regression, Decision Tree, Random Forest, Support diagnoses. These models can serve as
Vector Classifier, K-Nearest Neighbors, Gaussian Naive decision support tools, potentially improving
Bayes, Gradient Boosting, and XGBoost. Each model was patient outcomes through earlier and more
rigorously tested using cross-validation techniques to ensure accurate detection of breast cancer.
the reliability of the results, and various performance metrics  The interpretability of simpler models like
were calculated, including Best CV Score, Accuracy, Logistic Regression and Support Vector
Precision, Recall, F1-Score, and AUC-ROC. Classifier also highlights their potential utility
Key Findings: in clinical settings, where understanding the
1. Model Performance: rationale behind predictions is essential for
 XGBoost emerged as the top-performing gaining trust from both clinicians and
model with the highest Best CV Score of patients.
0.97 and an Accuracy of 96%. This indicates 4. Future Work:
its strong capability in distinguishing  Future research could explore the integration
between malignant and benign cases of additional features, such as genetic
effectively. information or imaging data, to enhance
 Random Forest and Gradient model performance further. Additionally,
Boosting also demonstrated robust implementing techniques like hyperparameter
performance, with Best CV Scores of 0.96 tuning and feature selection could optimize
and 0.95, respectively. These ensemble the models for even better accuracy and
methods leverage multiple decision trees to reliability.
improve predictive accuracy and reduce  It would also be beneficial to conduct
overfitting. external validation of the models on
 Logistic Regression and Support Vector independent datasets to assess their
Classifier performed well, achieving Best generalizability and robustness in real-world
CV Scores of 0.95 and 0.94, respectively. clinical scenario s.
These models are simpler and provide Repository Overview
interpretable results, making them valuable The accompanying repository contains the following
for clinical settings where understanding the components:
decision-making process is crucial.  Data: The dataset used for training and testing the
 K-Nearest Neighbors and Gaussian Naive models, including preprocessing scripts to ensure data
Bayes showed comparatively lower quality and integrity.
performance, with Best CV Scores of 0.90  Model Implementations: Code for each machine
and 0.88, respectively. While these models learning model, including training, evaluation, and
can be useful in certain contexts, they may performance metric calculations.
not be the best choice for this specific  Results: A detailed summary of the model
application. performances, including visualizations such as ROC
2. Importance of Metrics: curves and confusion matrices to aid in understanding
 The evaluation metrics highlighted the model behavior.
models' strengths and weaknesses. For  Documentation: Comprehensive documentation
instance, while accuracy is a critical outlining the methodology, model selection rationale,
measure, precision and recall are equally and instructions for replicating the experiments.
important in medical diagnostics, where In summary, this study demonstrates the potential of machine
false negatives can have severe learning models in breast cancer detection, with XGBoost
consequences. The F1-Score provides a showing the most promise. The findings underscore the
balance between precision and recall, importance of model selection and evaluation in developing
making it a valuable metric for assessing effective diagnostic tools that can enhance clinical decision-
model performance in this context. making and improve patient care.
 The AUC-ROC scores further illustrated the .
models' ability to discriminate between
classes, with XGBoost achieving an AUC- RefeReNces
ROC of 0.97, indicating excellent [1]  Deep Learning Approaches to Detect Breast Cancer: A
performance in distinguishing between Comprehensive Review"
malignant and benign cases. This review analyzes various deep learning techniques applied to breast
cancer detection across multiple imaging modalities.
3. Clinical Implications: Link
 The findings suggest that advanced [2]  "Breast Cancer Detection Using Machine Learning in Digital
ensemble methods like XGBoost and Mammography"
This study explores the application of machine learning algorithms to
Authorized licensed use limited to: Indian Institute of Information Technology Design & Manufacturing. Downloaded on November 10,2020 at 09:48:02 UTC from IEEE Xplore. Restrictions apply.
382

digital mammography, highlighting improvements in screening detection with a FastAPI-based web application.
accuracy. Link
Link [20]  "Breast Cancer Detection Using Machine Learning and
[3]  "Breast Cancer Detection Using Machine Learning Algorithms" TensorFlow.js"
This paper discusses the implementation of different machine learning This repository provides an example of deploying a breast cancer
algorithms for breast cancer detection, focusing on their performance detection model using TensorFlow.js for in-browser inference.
metrics. Link
Link [21] 2017.
[4]  "The BCPM Method: Decoding Breast Cancer with Machine
Learning"
This research presents a comprehensive machine learning model for
precise breast cancer diagnosis.
Link
[5]  "Breast Cancer Detection Using Artificial Intelligence
Techniques: A Review"
This review examines various artificial intelligence techniques applied to
breast cancer detection, including machine learning and deep learning
methods.
Link
[6]  "Breast Cancer Detection Web Application"
This GitHub repository presents a web application developed for breast
cancer detection using machine learning models.
Link
[7]  "Breast_Cancer_Detection_ML-with-Web-End-Deployment"
This project demonstrates the deployment of a machine learning model
for breast cancer detection through a web interface.
Link
[8]  "Breast Cancer Detection Using Machine Learning"
This GitHub repository provides code and documentation for
implementing machine learning models in breast cancer detection.
Link
[9]  "Breast Cancer Detection Using Machine Learning and Web
Application"
This project integrates machine learning models for breast cancer
detection with a user-friendly web application interface.
Link
[10]  "Breast Cancer Detection Using Machine Learning and Flask"
This repository showcases the deployment of a breast cancer detection
model using Flask to create a web application.
Link
[11]  "Breast Cancer Detection Using Machine Learning and Django"
This project demonstrates the integration of machine learning models for
breast cancer detection into a Django-based web application.
Link
[12]  "Breast Cancer Detection Using Machine Learning and
Streamlit"
This repository provides an example of deploying a breast cancer
detection model using Streamlit for the web interface.
Link
[13]  "Breast Cancer Detection Using Machine Learning and React"
This project integrates a machine learning model for breast cancer
detection with a React-based front-end application.
Link
[14]  "Breast Cancer Detection Using Machine Learning and
Angular"
This repository demonstrates the deployment of a breast cancer detection
model using Angular for the web interface.
Link
[15]  "Breast Cancer Detection Using Machine Learning and Vue.js"
This project showcases the integration of a machine learning model for
breast cancer detection into a Vue.js-based web application.
Link
[16]  "Breast Cancer Detection Using Machine Learning and Node.js"
This repository provides an example of deploying a breast cancer
detection model using Node.js for the back-end server.
Link
[17]  "Breast Cancer Detection Using Machine Learning and
Express.js"
This project demonstrates the integration of a machine learning model
for breast cancer detection into an Express.js-based web application.
Link
[18]  "Breast Cancer Detection Using Machine Learning and Flask-
RESTful"
This repository showcases the deployment of a breast cancer detection
model using Flask-RESTful to create a web API.
Link
[19]  "Breast Cancer Detection Using Machine Learning and
FastAPI"
This project integrates a machine learning model for breast cancer
Authorized licensed use limited to: Indian Institute of Information Technology Design & Manufacturing. Downloaded on November 10,2020 at 09:48:02 UTC from IEEE Xplore. Restrictions apply.
383

[22] M. R. Moosa, “Kidney transplantation in developing countries,” in Kid- [33] S. Christodoulidis, M. Anthimopoulos, L. Ebner, A. Christe, and
ney Transplantation–Principles and Practice (Seventh Edition). Elsevier, S. Mougiakakou, “Multisource transfer learning with convolutional
2014, pp. 643–675. neural networks for lung pattern analysis,” IEEE Journal of Biomedical
[23] N. Tajbakhsh, J. Y. Shin, S. R. Gurudu, R. T. Hurst, C. B. Kendall, M. B. and Health Informatics, vol. 21, no. 1, pp. 76–84, 2017.
Gotway, and et al, “Convolutional neural networks for medical image [34] S. A. Tuncer and A. Alkan, “A decision support system for detection of
analysis: Full training or fine tuning?” IEEE Transactions on Medical the renal cell cancer in the kidney,” Measurement, vol. 123, pp. 298–303,
Imaging, vol. 35, no. 5, pp. 1299–1312, 2016. 2018.
[24] H.-C. Shin, H. R. Roth, M. Gao, L. Lu, Z. Xu, I. Nogues, and [35] A. Samat, P. Du, M. H. A. Baig, S. Chakravarty, and L. Cheng,
et al, “Deep convolutional neural networks for computer-aided detection: “Ensemble learning with multiple classifiers and polarimetric features
Cnn architectures, dataset characteristics and transfer learning,” IEEE for polarized SAR image classification,” Photogrammetric Engineering
Transactions on Medical Imaging, vol. 35, no. 5, pp. 1285–1298, 2016. & Remote Sensing, vol. 80, no. 3, pp. 239–251, 2014.
[25] D. Meng, L. Zhang, G. Cao, W. Cao, G. Zhang, and B. Hu, “Liver fi- [36] Y. N. Hwang, J. H. Lee, G. Y. Kim, E. S. Shin, and S. M. Kim,
brosis classification based on transfer learning and FCNet for ultrasound “Characterization of coronary plaque regions in intravascular ultrasound
images,” IEEE Access, vol. 5, pp. 5804–5810, 2017. images using a hybrid ensemble classifier,” Computer Methods and
[26] Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A Programs in Biomedicine, vol. 153, pp. 83–92, 2018.
review and new perspectives,” IEEE Transactions on Pattern Analysis [37] T. Fawcett, “An introduction to roc analysis,” Pattern recognition letters,
and Machine Intelligence, vol. 35, no. 8, pp. 1798–1828, 2013. vol. 27, no. 8, pp. 861–874, 2006.
[27] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transac- [38] M. Subramanya, V. Kumar, S. Mukherjee, and M. Saini, “Classification
tions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345– of normal and medical renal disease using b-mode ultrasound images,”
1359, 2010. in IEEE, 2nd International Conference on Computing for Sustainable
[28] T. G. Dietterich, “Ensemble methods in machine learning,” in Interna- Global Development (INDIACom), 2015, pp. 1914–1918.
tional workshop on multiple classifier systems. Springer, 2000, pp. 1– [39] K. B. Raja, M. Madheswaran, and K. Thyagarajah, “Analysis of ul-
15. trasound kidney images using content descriptive multiple features for
[29] S.-C. Lo, S.-L. Lou, J.-S. Lin, M. T. Freedman, M. V. Chien, and S. K. disorder identification and ANN based classification,” in IEEE Inter-
Mun, “Artificial convolution neural network techniques and applications national Conference on Computing: Theory and Applications (ICCTA),
for lung nodule detection,” IEEE Transactions on Medical Imaging, 2007, pp. 382–388.
vol. 14, no. 4, pp. 711–718, 1995. [40] M. Subramanya, V. Kumar, S. Mukherjee, and M. Saini, “Svm-based
[30] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification cac system for b-mode kidney ultrasound images,” Journal of digital
with deep convolutional neural networks,” in Advances in neural infor- imaging, vol. 28, no. 4, pp. 448–458, 2015.
mation processing systems, 2012, pp. 1097–1105. [41] K. Sharma and J. Virmani, “A decision support system for classifi-
[31] A. Menegola, M. Fornaciali, R. Pires, S. Avila, and E. Valle, “Towards cation of normal and medical renal disease using ultrasound images:
automated melanoma screening: Exploring transfer learning schemes,” A decision support system for medical renal diseases,” International
arXiv preprint arXiv:1609.01228, 2016. Journal of Ambient Computing and Intelligence, vol. 8, no. 2, pp. 52–69,
[32] R. Zhang, Y. Zheng, T. W. C. Mak, R. Yu, S. H. Wong, J. Y. Lau, 2017.
and et al, “Automatic detection and classification of colorectal polyps [42] J. Verma, M. Nath, P. Tripathi, and K. Saini, “Analysis and identification
by transferring low-level cnn features from nonmedical domain,” IEEE of kidney stone using kth nearest neighbour (KNN) and support vector
Journal of Biomedical and Health Informatics, vol. 21, no. 1, pp. 41–47, machine (SVM) classification techniques,” Pattern Recognition and
2017. Image Analysis, vol. 27, no. 3, pp. 574–580, 2017.

Authorized licensed use limited to: Indian Institute of Information Technology Design & Manufacturing. Downloaded on November 10,2020 at 09:48:02 UTC from IEEE Xplore. Restrictions apply.

You might also like