Session 15-Logistic Regression

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 16

Logistic Regression

Dr. Rajiv Kumar


IIM Kashipur

Note: Content used in this PPT has copied from various source.
Introduction

Logistic regression is a method for analyzing relative probabilities


between discrete outcomes (binary or categorical dependent variables)
 Binary outcome: standard logistic regression
 i.e., Pass (1) or Fail (0)
 Categorical outcome: multinomial logistic regression
 i.e., Engineer (1) or Doctor (2) or Lawyer (3) or Teacher (4)

In general, logistic regression relates to the way predictor variables impact


the probability of any event and these significant variables identify the
most promising target customer or most critical variable combinations.
Binary Logistic Regression

 The logistic equation (cumulative logistic distribution (logit)) is written as a


function of z, where z is a measure of the total contribution of each variable x used to
predict the outcome

Probability
(event)

 Coefficients determined by maximum likelihood estimation (MLE), so larger sample


sizes are needed than for OLS
Graph of the Logistic Function

Probability (Event)

z
Mathematical Interpretation of Logistic Regression

Adopting S-shaped curve gives the probability of purchase p on the y-axis

Where
Mathematical Interpretation of Logistic Regression
Mathematical Interpretation of Logistic Regression…
Model Fit

 Smaller the -2LL -> Higher the likelihood

 Cox and Snell R-square


Close to 1 as possible
 Nagel-kerke R-square
Forward LR Method

 In Forward LR approach, the algorithm starts with a model having no


independent variable, only a constant term. This is the Block 0 or Beginning
Block.

 Therefore, one by one predictor variables are added, whichever gives most
decrease in -2LL

 At every stage model Chi-square is computed which is the difference between


-2LL for previous model and the current model. If this difference is not
significant anymore, the algorithm stops.
Forward LR Method

 Also, at any stage the probability of the event is computed


Forward LR Method
Example
No. Loyalty Brand Product Shopping
1 1 4 3 5
2 1 6 4 4
 Brand_Loyalty_Data.xlsx 3
4
1
1
5
7
2
5
4
5
5 1 6 3 4
6 1 3 4 5
7 1 5 5 5
8 1 5 4 2

 Output Variable: Loyalty


9 1 7 5 4
10 1 7 6 4
11 1 6 7 2
12 1 5 6 4
 Input Variables: Brand, Product, Shopping 13
14
1
1
7
5
3
1
3
4
15 0 7 5 5
16 0 3 1 3
17 0 4 6 2
18 0 2 5 2
19 0 5 2 4
20 0 4 1 3
21 0 3 3 4
22 0 3 4 5
23 0 3 6 3
24 0 4 4 2
25 0 6 3 6
26 0 3 6 3
27 0 4 3 2
28 0 3 5 2
29 0 5 5 3
30 0 1 3 2
Logistic Regression in R (1of3)

library(readxl)
library(MASS)
mydata<-read_excel("Brand_Loyalty_Data.xlsx")
#Print Data Frame
mydata

#Call glm function for Logistric Regression

mymodel<-glm(Loyalty~ Brand+Product+Shopping, data = mydata)

summary(mymodel)
#mymodel$linear.predictors #Use this to see the z-values
#mymodel$coefficients #Use this to see the coefficients
Logistic Regression in R (2of3)

Output

Interpreting the Output:


Z=-0.475180+0.161153*Brand+0.007827*Product+0.047897 *Shopping
P=1/(1+e-z)
Logistic Regression in R (3of3)

Confusion matrix and Hit Ratio (Accuracy) of the model


#Predicting the case/cases:
#Predicting the case/cases:
z_values<-predict(mymodel, mydata) #Here we consider training data. One can use test data
#One can use mymodel$linear.predictors to get
z_values directly
z_values
Loyalty_Predicted=1/(1+exp(-z_values))>0.5
Output (Confusion Matrix)
table(ActualValue=mydata$Loyalty, PredictedValue= Loyalty_Predicted)
Hit Ratio (Accuracy)=Total Correct Prediction/(Total Correct Prediction
+ Total False Prediction)
=16/(16+14)
=53.3%
Logistic Regression in R (3of3)

Predicting a case
#Predicting a case : Brand=1, Product=3, Shopping=2
l1=list("Brand"=1, "Product"=3, "Shopping"=2) #One can use data frame instead of a list
z<-predict(mymodel, newdata = l1)
z
prob=1/(1+exp(-z))
prob Output
PredictedValue=1/(1+exp(-z))>0.5
PredictedValue

Z value Probability Predicted value (Class)

You might also like