Medical Insurance Prediction Slides
Medical Insurance Prediction Slides
Medical Insurance Prediction Slides
PROJECT OVERVIEW
• The objective of this case study is to predict the health insurance cost incurred by
Individuals based on their age, gender, BMI, number of children, smoking habit and
geo-location.
https://www.publicdomainpictures.net/en/view-image.php?image=279
909&picture=medical-insurance
PROJECT OVERVIEW
• Goal is to obtain a relationship (model) between two variables only such as age and insurance cost for
example.
MODEL! (GOAL)
INSURANCE COST ($)
𝑦 =𝑏+ 𝑚∗ 𝑥
AGE (YEARS)
6
MULTIPLE LINEAR REGRESSION:
INTUITION
• Multiple Linear Regression: examines relationship between more than two variables.
• Recall that Simple Linear regression is a statistical model that examines linear relationship between two variables only.
• Each independent variable has its own corresponding coefficient.
𝑦 =𝑏 0 +𝑏1 ∗ 𝑥1 + 𝑏2 ∗ 𝑥2 +..+ 𝑏𝑛 𝑥 𝑛
(estimated/predicted)
AGE (YEARS)
9
REGRESSION METRICS: MEAN
ABSOLUTE ERROR (MAE)
• Mean Absolute Error (MAE) is obtained by calculating the absolute difference between the model predictions and the
true (actual) values
• MAE is a measure of the average magnitude of error generated by the regression model
• The mean absolute error (MAE) is calculated as follows:
AGE
REGRESSION METRICS: R SQUARE ()-
COEFFICIENT OF DETERMINATION
• R-square represents the proportion of variance of the dependant variable (y) that has been explained by the
independent variables.
• R-square provides an insight of goodness of fit.
• It gives a measure of how well unseen samples are likely to be predicted by the model, through the proportion
of explained variance.
• Maximum value is 1
• A constant model that always predicts the expected value of y, disregarding the input features, will have an R²
score of 0.0.
REGRESSION METRICS:
ADJUSTED R SQUARE ()
• If , this means that 80% of the increase in medical insurance cost is due to increase in applicant's age.
• Let’s add another ‘useless’ independent variable, let’s say “color of car” to the Z-axis. (note that we are trying
to predict the medical insurance cost and not the car insurance cost!)
• Now increases and becomes:
INSURANCE COST
C A R
N T ’ S
C A
APPLI
OR OF
COL
AGE
REGRESSION METRICS: ADJUSTED R
SQUARE ()
• One limitation of is that it increases by adding independent variables to the model which is misleading since
some added variables might be useless with minimal significance.
• Adjusted overcomes this issue by adding a penalty if we make an attempt to add independent variable that
does not improve the model.
• Adjusted is a modified version of the and takes into account the number of predictors in the model.
• If useless predictors are added to the model, Adjusted will decrease
• If useful predictors are added to the model, Adjusted will increase
• is the number of independent variables and is the number of samples
ARTIFICIAL NEURAL
NETWORKS FOR
REGRESSION
NEURON MATHEMATICAL MODEL
• The neuron collects signals from input channels named dendrites, processes
information in its nucleus, and then generates an output in a long thin branch
called axon.
X1
W1
W2
X2 NEURON
W3
DENDRITES X3
AXON
NUCLEAS
DO YOU REMEMBER OUR FIRST
NEURON MODEL?
• Bias allows to shift the activation function curve up or down.
• Number of adjustable parameters = 4 (3 weights and 1 bias).
• Activation function “F”.
b
𝑿𝟏 𝑊1
𝑊2
INPUTS/INDEPENDENT
VARIABLES 𝑿𝟐 F
𝑊3
𝑿𝟑 𝑦 = 𝑓 ( 𝑋 1 𝑊 1 + 𝑋 2 𝑊 2 + 𝑋 3 𝑊 3 +𝑏)
SINGLE NEURON MODEL IN ACTION!
𝑏=0
Input #1=1 𝑿𝟏 =0.7
𝑊 2=0.1
Input #2=3
𝑿𝟐 F
𝑊 3=0.3
Input #3=4
𝑿𝟑
ACTIVATION
FUNCTIONS
ACTIVATION FUNCTIONS
• SIGMOID:
o Takes a number and sets it between 0 and 1
o Converts large negative numbers to 0 and large positive
numbers to 1.
o Generally used in output layer.
W12
W W12 W13 b1
W 11 P2
W21 W22 W23 W22 1
n2
b W13 S f a2
b 1 P1 W1
b2 P3
W23 n2b2 a2
W2
a f (W P b) P2 ∑ 1
f
W3
P3
b2
a 2 f ( P1W1 P2W2 P3W3 b 2)
29
MULTI-LAYER PERCEPTRON NETWORK
[ ]
𝑊 11 𝑊 12 𝑊1,𝑁
⋯ 1 Non-Linear Sigmoid Activation function
𝑊 21 𝑊 22 𝑊 2,𝑁 1
1
𝜑 (𝑤)= −𝑤
⋮ ⋱ ⋮ 1 +𝑒
𝑊 𝑚 − 1 ,1 𝑊 𝑚 −1 ,2 𝑊 𝑚− 1 , 𝑁
⋯ 1
m: number of neurons in the hidden layer
𝑊 𝑚 ,1 𝑊 𝑚, 2 𝑊 𝑚, 𝑁 1
: number of inputs
HOW DO ANNS
TRAIN?
ANN TRAINING PROCESS
Performance measure
(Mean Square Error)
Predicted Desired
Training
Testing Output (True)
Epochs or Time
inputs
inputs Output Y
XX
Error =
• Data set is generally divided into 80% for training and 20% for testing.
• Sometimes, we might include cross validation dataset as well and then we divide it
into 60%, 20%, 20% segments for training, validation, and testing, respectively
(numbers may vary). TRAINING DATASET
1. Training set: used for gradient calculation and weight update. 80%
2. Validation set:
o used for cross-validation which is performed to assess training quality as TESTING DATASET
training proceeds. 20%
o Cross-validation is implemented to overcome over-fitting (over-training).
Over-fitting occurs when algorithm focuses on training set details at cost
of losing generalization ability.
o Trained network MSE might be small during training but during testing, TRAINING DATASET
the network may exhibit poor generalization performance. 60%
3. Testing set: used for testing trained network.
VALIDATION DATASET
20%
TESTING DATASET
20%
GRADIENT DESCENT
GRADIENT DESCENT
• The size of the steps taken are called the learning rate
• If learning rate increases, the area covered in the search space will increase so we
might reach global minimum faster
• However, we can overshoot the target
• For small learning rates, training will take much longer to reach optimized
weight values
GRADIENT DESCENT
𝑁 𝑖=1 𝑁 𝑖=1
GRADIENT DESCENT
𝑛
1
𝐿𝑜𝑠𝑠 𝐹𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑓 (𝑚, 𝑏)= ∑ ( ^𝑦 𝑖 − 𝑦 𝑖 )2
*Note: in reality, this graph is 3D and has three axes, one for m, b and sum of squared residuals
BACKPROPAGATION
BACK PROPAGATION
STEP 2: ERROR
CALCULATION
STEP 4: WEIGHT UPDATE
STEP 2: ERROR
CALCULATION
STEP 4: WEIGHT UPDATE
STEP 2: ERROR
CALCULATION
STEP 4: WEIGHT UPDATE