Diabetes Prediction model using streamlit
VISVESVARAYA TECHNOLOGICAL UNIVERSITY
"Jnana Sangama", Belagavi – 590018
INTER/INTRA INSTITUTIONAL INTERNSHIP
Submitted in partial full fillment of the requirement for the award of the degree
Bachelor of Engineering
In
ELECTRONICS AND COMMUNICATION
Submitted by
DEEPTHI A : 4GL21EC015
Under the guidance of
Prof. Pavithra T
M.Tech
HOD(Group A)
Dept. E&C Engineering
GEC Kushalnagar -571 234
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING
GOVERNMENT ENGINEERING COLLEGE, KUSHALNAGAR.
Government Engineering College, Kushalnagar-571234
1
Diabetes Prediction model using streamlit
Chapter 1
Abstract
Python is popular programming , interpreted and object oriented language. It is used for
software development, game development, scripting and mathematics. It was created by
Guido Van Rossum in 1991. In 1994 the first version of python was released with new
features like lambda, map and filter. In 1997 python second version was released with
features like garbage collection, libraries etc. In 2008 python version 3 was released with
features like standard library, mongo db, flask etc. The Jupiter notebook app is a server-client
application that allows editing and running notebook documents via a web browser. The
Jupiter notebook app can be executed on a local desktop requiring no internet access or can
be installed on a remote server and accessed through the internet. In addition to
displaying/editing/running notebook documents, the Jupiter notebook app has a dashboard, a
control panel showing local files and allowing to open notebook documents or shutting down
their kernels. As an alternative to this tool, Colaboratory or colab for short, is a product from
Google research. Colab allows anybody to write and execute arbitrary pythan code through
the browser and is especially well suited to machine learning , data analysis and education.
More technically colab is a hosted Jupyter notebook service thst requires no setup to use,
while providing access free of charge to computing resources including GPUs.
Streamlit is an open source Python library which is blazingly fast that makes it easy to build
beautiful custom web-apps for machine learning and dada science. It is an awesome tool that
allows you to create highly interactive dashboards just with some knowledge of python.
Creating applications using streamlit creates an impact on the end-user as it has a good user
interface and supports a lot of widgets that are user friendly. Also creating apps in streamlit is
easy. We will create an application using streamlit which will predict whether a user has
diabetes or not. The dataset which contains 8 prediction varialbles and 1 target variable. Let
us look at what are the different attributes in the dataset. The predictor variables is named
outcome which is encoded as 0 and where 1 represents Diabetic.
2
Diabetes Prediction model using streamlit
Chapter 2
Introduction
According to the National Institute of Health (NIH), “Diabetes is a disease that occurs when
your blood glucose, also called blood sugar, is too high.” Most of the food we eat is broken
down into a sugar called glucose, and insulin is the harmone that enables glucose to get into
our cells. Diabetes is caused by the body’s inability to produce enough insulin or to properly
utilize the insulin it produces, resulting in excess glucose in the blood leading to significant
health issues. Although there is no treatment for diabetes, you could take steps to preserve
your health. There are three major types of diabetes, type1, type2, gestational diabetes. (1)
Your body does not produce insulin if you have type 1 diabetes. Your immune system targets
and kills the insulin-producing cells in your pancreas. Diabetes type 1 is most commonly
diagnosed in children and young adults, although it can affect anybody. If you have type 1
diabetes you need to take insulin every day. (2) Your body does not generate or utilize insulin
well if you have type 2 diabetes. This is critical to get tested if at risk because no symptoms
may appear. Type 2 diabetes can be delayed or prevented by leading a healthy lifestyle.
(3)Gestational diabetes- This happens to certain women during pregnancy. This form of
Diabetes usually goes away once the baby is born. If you have experienced gestational
diabetes, you are more likely to acquire type 2 diabetes later. If you have gestational diabetes
your baby is more likely to suffer from health issues, including type 2 diabetes. We are going
to build a project on Diabetes Prediction using Machine learning. Machine Learning is a very
useful in the medical field to detect many diseases in their early stage. Diabetes Prediction is
one such Machine Learning model which helps to detect diabetes in humans. Also we will
see how to Deploy a Machine Learning model using streamlit.
3
Diabetes Prediction model using streamlit
Chapter 3
Company profile
3.1 Formation of company
Aqmenz Automation Private Limited is a private incorporated on 15 th October 2018. It is
classified as Non-Govt company and is registered at Registrar of companies, Bangalore.
Brief history of company
Aqmenz Automation Pvt Ltd (AAPL) is situated in northern part of Bangalore, RT Nagar,
Karnataka. AAPL provides Mechanical Design & Automation solutions to their client
companies. AAPL also involved in Open source Robotics and developed different varieties of
Robots.
AAPL also started INDOSKILL, a separate platform for the students to get training and work
on various Real Time Industrial Projects. Indoskill offers skill-oriented hands-on training
through an online platform.
Field of Expertise: Open-source Robotics, Industrial Automation, Product Design, Python &
Deep Learning and Embedded Systems
Objectives
AAPL had a trust in Skill India mission & vision, hence our utmost priority is to add
skill to the young Generation and make them Profitable and productive for the nation.
We aim in Providing Industrial Automation Training Skill module kits to Institution,
University’s & Collage Lab Facilities with Lowest Possible Price for the Benefits of
Technical Students.
Identifying young entrepreneurs and Motivate, training them to establish Startup to
create Employment as well as prosperity for the nation.
Consultation, Sourcing and supplying highly skilled Manpower to Industry for better
efficiency and productivity.
Providing low cast & precise industrial automation solutions. Very eager to fetch
solution for most complex industrial problems in a modest way.
4
Diabetes Prediction model using streamlit
3.2 Vision and mission
Our Motto and Vision are to create awareness & training young generation to current and
future jobs demands and also help to current and future jobs demands; meanwhile help the
students and employees to meet the mandatory necessities of future human resources and
skill demands. We are in the 4th industrial revolution. The technological revolution is
catastrophic like never before, hence continues awareness for the up-gradation environment is
much essential. Aqmenz Automation Pvt. Ltd. is working to help and enhance the potential of
students and employees. So that future human resources will be very beneficial, purposeful
and profitable to the nation.
Major Milestones
We have under gone many industrial projects. Our major clients are BIAL (Bangalore
International Airport Limited), GE (General Electric) and Amics technologies.
About the company:
Organization structure
The organization structure is having three different departments such as design department,
software department and sales and marketing.
AAPL
SALES &
DESIGN SOFTWARE
MARKETING
Service offered
Provides Design & Automation solutions.
All type of automation projects to companies using PLC’s, SCADA embedded systems.
We provide robots and robotic solutions to small and medium scale companies
Embedded solutions to companies like GE
We conduct technical skill oriented training programs to engineering colleges.
We also provide robotics and automation lab equipments for colleges.
Number of people working in company and their responsibilities:
5
Diabetes Prediction model using streamlit
There are 20 persons in this company, out of which:
Shamanna Mohan, Chief Executive Officer (CEO)
Mohammed Azhar Hussain, Chief Technology Officer (CTO)
3.3 Ongoing projects
Automation related projects
CNC Machines
Open-source Custom Robots
Garment Industry slider project
Chapter 4
6
Diabetes Prediction model using streamlit
System analysis
4.1 Existing system:
The current methods used for diabetes prediction, such as manual diagnosis or basic
statistical analysis are not much accurate and efficient.
There are some limitations and shortcomings of the existing system, such as low
accuracy, time consuming process, and lack of automation.
In existing system, the single data mining technique is used to diagnose the disease.
There is no previous research that identifies which data mining technique can provide
more reliable accuracy.
They usess only the internal measures to measure the fasting plasma glucose for
predicting the type 2 diabetes.
Diabetes prediction using algorithms such as k-Nearest Neighbour, k-means, branch
and bound algorithm was proposed. A basic diabetes dataset is chosen for carrying out
the comparative analysis. The importance of feature analysis for predicting diabetes
by employing machine learning technique is discussed.
4.2 Proposed system:
The proposed system study is classification of Indian PIMA dataset for diabetes as
binary classification problem.
This is proposed to achieve through machine learning and deep learning classification
algorithm.
For machine learning, SVM algorithm is proposed whereas for deep learning Nueral
network is used.
The proposed system improves accuracy of prediction through deep learning
techniques.
We can have the advantages of early detection, personalized predictions, scalability,
and automation by analyzing various data sources and variables, the model can
potentially identify patterns and indicators of diabetes at an early stage, allowing for
timely intervention and treatment.
The proposed system can handle a large volume of data and can be easily scaled to
accommodate a growing number of users.
Chapter 5
7
Diabetes Prediction model using streamlit
Project working and flow
The work flow to build the Machine learning project to predict diabetes is as follows:
1. Collection of data
2. Exploring the data
3. Splitting the data
4. Training the data
5. Evaluating the model
6. Deploying the model
5.1 Collection of data:
The very first step is to choose the dataset for our model. We can get a lot of different
datasets from Kaggle. You just need to sign in to Kaggle and search for any dataset you
need for the project. The objective is to predict whether a patient has diabetes based on
diagnostic measurements. Several constraints were placed on the selection of these
instances from a larger database.
The data contains 9 columns which are as follows :
Pregnancies: Number of times pregnant
Glucose: Plasma glucose concentration a 2 hours in an oral glucose tolerance test
Blood Pressure: Diastolic blood pressure (mm Hg)
Skin Thickness: Triceps skin fold thickness (mm)
Insulin: 2-Hour serum insulin (mu U/ml)
BMI: Body mass index (weight in kg/(height in m)^2)
Diabetes Pedigree Function: Diabetes pedigree function
Age: Age (years)
Outcome: Class variable (0 or 1)
5.2 Exploring the Data
Now we have to set the development environment to build our project. For this project, we
are going to build this Diabetes prediction using Machine Learning in Google Colab. You can
also use Jupyter Notebook.
8
Diabetes Prediction model using streamlit
After downloading the dataset, import the necessary libraries to build the model.
# Import the required libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.metrics import accuracy_score
import pickle
Load the data using the read_csv method in the pandas library. Then the head() method in the
pandas library is used to print the rows up to the limit we specify. The default number of rows
is five.
# Load the diabetes dataset to a pandas DataFrame
diabetes_dataset = pd.read_csv('diabetes.csv')
# Print the first 5 rows of the dataset
diabetes_dataset.head()
Output:
# To get the number of rows and columns in the dataset
diabetes_dataset.shape
#prints (768, 9)
# To get the statistical measures of the data
diabetes_dataset.describe()
Output:
9
Diabetes Prediction model using streamlit
And, it is clear that the Outcome column is the output variable. So let us explore more details
about that column.
# To get details of the outcome column
diabetes_dataset['Outcome'].value_counts()
Output:
In the output, the value 1 means the person is having Diabetes, and 0 means the person is not
having Diabetes. We can see the total count of people with and without Diabetes.
5.3 Splitting the data
The next step in the building of the Machine learning model is splitting the data into training
and testing sets. The training and testing data should be split in a ratio of 3:1 for better
prediction results.
# separating the data and labels
X = diabetes_dataset.drop(columns = 'Outcome', axis=1)
Y = diabetes_dataset['Outcome']
# To print the independent
variables print(X)
Output:
10
Diabetes Prediction model using streamlit
# To print the outcome variable
print(Y)
Output:
#Split the data into train and test
X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size = 0.2, stratify=Y,
random_state=2)
print(X.shape, X_train.shape, X_test.shape)
Output:
(768, 8) (614, 8) (154, 8)
5.4 Training the model
The next step is to build and train our model. We are going to use a Support vector classifier
algorithm to build our model.
# Build the model
11
Diabetes Prediction model using streamlit
classifier = svm.SVC(kernel='linear')
# Train the support vector Machine Classifier
classifier.fit(X_train, Y_train)
After building the model, the model has to predict output with test data. After the prediction
of the outcome with test data, we can calculate the accuracy score of the prediction results
by the model.
# Accuracy score on the training data
X_train_prediction =
classifier.predict(X_train)
training_data_accuracy = accuracy_score(X_train_prediction, Y_train)
print('Accuracy score of the training data : ', training_data_accuracy)
# Accuracy score on the test data
X_test_prediction =
classifier.predict(X_test)
test_data_accuracy = accuracy_score(X_test_prediction, Y_test)
Output:
Accuracy score of the training data: 0.7833876221498371
Accuracy score of the test data: 0.7727272727272727
5.5 Evaluating the model
input_data = (5,166,72,19,175,25.8,0.587,51)
# Change the input_data to numpy array
input_data_as_numpy_array = np.asarray(input_data)
# Reshape the array for one instance
input_data_reshaped = input_data_as_numpy_array.reshape(1,-1)
prediction =
classifier.predict(input_data_reshaped)
print(prediction)
12
Diabetes Prediction model using streamlit
if (prediction[0] == 0):
print('The person is not diabetic')
else:
print('The person is diabetic')
Output:
The person is diabetic
Saving the file
# Save the trained model
filename = 'trained_model.sav'
pickle.dump(classifier, open(filename, 'wb'))
# Load the saved model
loaded_model = pickle.load(open('trained_model.sav', 'rb'))
Once you run this code a new file named trained_model.sav will be saved in the project folder.
5.6 Deploying the model
One of the most important and final steps in building a Machine Learning project is Model
deployment. There are many frameworks available for deploying the Machine learning model
on the web. Some of the most used Python frameworks are Django and Flask. But these
frameworks require a little knowledge of languages such as HTML, CSS, and JavaScript.
So, a new framework known as Streamlit was introduced to deploy the Machine Learning
model without the need to have the knowledge of Front End Languages. It is quite easy to
deploy using Streamlit. So, we will use the Streamlit framework to deploy our model.
Although Streamlit has many advantages over the other frameworks, lot more features are
under development. If you are getting started in Machine Learning then this framework will
be a perfect start to deploy your machine learning model on the web.
Python Code to Deploy ML model using Streamlit
To install Streamlit run the following command in the command prompt or terminal.
pip install streamlit
13
Diabetes Prediction model using streamlit
5.7 Source code with Explanation
Open a new Python file and put the following code.
App.py
Importnumpyasnp
Importpickle
Importstreamlitasst
# Load the saved model
loaded_model=pickle.load(open('C:/Users/ELCOT/Downloads/trained_model.sav', 'rb'))
# Create a function for Prediction
defdiabetes_prediction(input_data):
# Change the input_data to numpy array
input_data_as_numpy_array=np.asarray(input_data)
# Reshape the array as we are predicting for one instance
input_data_reshaped=input_data_as_numpy_array.reshape(1,-1)
prediction=loaded_model.predict(input_data_reshaped)
print(prediction)
if(prediction[0]==0):
return'The person is not diabetic'
else:
return'The person is diabetic'
defmain():
# Give a title
st.title('Diabetes Prediction Web App')
# To get the input data from the user
Pregnancies=st.text_input('Number of Pregnancies')
Glucose=st.text_input('Glucose Level')
BloodPressure=st.text_input('Blood Pressure value')
14
Diabetes Prediction model using streamlit
SkinThickness=st.text_input('Skin Thickness value')
Insulin=st.text_input('Insulin Level')
BMI=st.text_input('BMI value')
DiabetesPedigreeFunction=st.text_input('Diabetes Pedigree Function value')
Age=st.text_input('Age of the Person')
# Code for Prediction
diagnosis=''
# Create a button for Prediction
ifst.button('Diabetes Test Result'):
diagnosis=diabetes_prediction([Pregnancies, Glucose, BloodPressure, SkinThickness,
Insulin, BMI, DiabetesPedigreeFunction, Age])
st.success(diagnosis)
if name ==' main ':
main()
# Code for Prediction
diagnosis=''
# Create a button for Prediction
ifst.button('Diabetes Test Result'):
diagnosis=diabetes_prediction([Pregnancies, Glucose, BloodPressure, SkinThickness,
Insulin, BMI, DiabetesPedigreeFunction, Age])
st.success(diagnosis)
if name ==' main ':
main()
15
Diabetes Prediction model using streamlit
Save the file after typing the code. And then to deploy using streamlit go to command prompt
and run the following command.
streamlit run
App.py (or)
streamlit run filename.py
After running the command the web app will open in the local host web server. Otherwise, go
to your browser and type localhost:8501. The following output will be shown.
Output:
Sample Input data for a person does not have diabetes is {1, 85, 66, 29, 0, 26.6, 0.351, 31}.
These data as input will generate the following output in the web app.
16
Diabetes Prediction model using streamlit
Sample input data for a person who have diabetes is {6, 148, 72, 35, 0, 33.6, 0.627, 50}. These
data as input will generate the following output in the web app.
Chapter 6
Results
17
Diabetes Prediction model using streamlit
Early intervention: With the help of a diabetes prediction model, healthcare providers
can intervene early and provide targeted interventions to help individuals manage
their health and prevent complications.
Resource optimization: By accurately predicting the likelihood of diabetes, healthcare
resources can be allocated, more efficiently, ensuring that those who need it most
receive the necessary care and attention.
Personalized predictions: Machine learning algorithms can analyze large amounts of
data and provide personalized risk assessments based on individual characteristics, such
as age, lifestyle and medical history.
Improved accuracy: By leveraging complex patterns in data, machine learning models
can potentially achieve higher accuracy in predicting diabetes compared to traditional
methods.
Cost-effective: Implementing a machine learning model for diabetes prediction can
potentially reduce healthcare costs by focusing resources on high-risk individuals and
preventive measures.
Research opportunities: Developing a diabetes prediction model using machine
learning opens up avenues for further research and understanding of the disease,
leading to advancements in treatment and prevention strategies.
Chapter 7
Advantages and Disadvantages
18
Diabetes Prediction model using streamlit
7.1 Advantages:
Early detection: Machine learning models can help identify individuals at risk of
developing diabetes at an early stage, allowing for timely intervention and prevention.
Personalized predictions: Machine learning algorithms can analyze large amounts of
data and provide personalized risk assessments based on individual characteristics, such
as age, lifestyle and medical history.
Improved accuracy: By leveraging complex patterns in data, machine learning models
can potentially achieve higher accuracy in predicting diabetes compared to traditional
methods.
Cost-effective: Implementing a machine learning model for diabetes prediction can
potentially reduce healthcare costs by focusing resources on high-risk individuals and
preventive measures.
Resource optimization: By accurately predicting the likelihood of diabetes, healthcare
resources can be allocated mare efficiently, ensuring that those who need it most
receive the necessary care and attention.
Research opportunities: Developing a diabetes prediction model using machine
learning opens up avenues for further research and understanding of the disease,
leading to advancements in treatment and prevention strategies.
7.2 Disadvantages:
Data quality: The accuracy of machine learning models heavily relies on the quality
and representativeness of the data used for training. If the data is incomplete, biased
or of poor quality, it can affected the reliability of the predictions.
Interpretability: Some machine learning models such as deep learning algorithms, can
be complex and difficult to interpret. Understanding how the model arrives at its
predictions may pose challenges for healthcare professionals.
19
Diabetes Prediction model using streamlit
Ethical Considerations: There are ethical concerns related to the use of personal health
data for predictive modelling. Ensuring privacy, consent, and fair use of data is crucial
in developing responsible and ethical machine learning models.
Limited generalizability: machine learning models trained on specific populations
may not generalize well to diverse populations or different regions due to variations in
lifestyle, genetics, and healthcare systems.
User acceptance: some individuals may be hesitant to embrace predictive models due
to concerns about privacy, data security, and the potential for discrimination based on
predicted health outcomes.
Expertise and infrastructure requirements: developing and implementing a diabetes
prediction model requires expertise in machine learning, access to quality data, and
the necessary computational resources.
Chapter 8
CONCLUSION
Machine learning is a quickly growing field in computer science. It has application in nearly
every other field of study and has already been implemented commercially because machine
learning can solve too difficult problems or time consuming for humans to solve. We have a
simple overview of some techniques and algorithms in machine learning. Furthermore, there
20
Diabetes Prediction model using streamlit
are more techniques apply machine learning as solution. In the future, machine learning will
play an important role in our daily life.
Python has a simple syntax like the English language. Python has syntax that allows
developers to write programs with fewer lines than some other programming languages.
Python runs on an interpreter system, meaning that code can be executed as soon as it is
written. This means that prototyping can be very quick.
In this internship we mainly learnt about Basics of Machine learning using python.
List of References
https://towardsdatascience.com
https://writingtestys.hashnode
https://github.com
https://www.anlyticvidhya.com
https://www.geeksforgeeks.org/getting-strarted-with-jupyter-notebook-python/
21
Diabetes Prediction model using streamlit
22