Tutorial 3 Logistic Regression Solutions 1

This document provides instructions for several logistic regression modeling exercises using various datasets. Key points: 1. Analyze diabetes data to predict diabetes diagnosis, achieving good prediction for non-diabetics but poorer prediction for diabetics. Variables like age and glucose are associated with diabetes. 2. Candy data predicts candy types like chocolate moderately well but others poorly. A multinomial logistic model categorizes candy types better. 3. Spirits drink data advises which brand image statements (e.g. mysterious, social) best fit each brand based on consumption context (home vs away). 4. Excel solver is used to build a diabetes model for comparison to SPSS results. Model accuracy may differ

Uploaded by

springfield12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views13 pages

Tutorial 3 Logistic Regression Solutions 1

Uploaded by

springfield12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 13

EFIM30051: Data Analytics and Artificial

Intelligence

Tutorial 3: Logistic Regression

Instructions

1. If you are unsure, try repeating the examples shown in the lecture but trying
not to look at the lecture itself.
2. Data: Diabetes data.sav.
The variable “outcome” is a variable indicating of a person has
diabetes (1) or not (0).
a. Explore the data set in tables with “outcome” as the top and the
independent variables as the rows.

Run a tables analysis to look to see if some variables are more

associated with diabetes than others. For example it looks like older
people with higher glucose levels are associated with diabetes.

b. What variables are predicting the diabetes outcome data. Is this as you
would expect? How well does the model perform and what are the
limitations?
Create a binary logistics variable with Outcome as the dependent.
Either choose AGE OR Age group (cannot have both).

Look at the parameter estimates (seg levels and B estimates)

Remove the non-sig vars and iterate until all are significant. Eventually
arrive at the below.

All vars are significant and sensible.

Look at the classification

It’s doing a good job at predicting those who are not diagnosed as
diabetic but not too good a job at those who are.
We can alter the threshold, making it harder for respondents to be
within the 0 “non diabetes”, hence making sure we are predicting a
larger amount of those with diabetes. However the price we pay is for
more respondents without diabetes to be now classified as with
diabetes. The balance is one which you need to decide on and which
“error” are you happier absorbing. Below is a 0.2 threshold (in options
menu).

Some expected insulin to be in the model. The issue here is that if run
a correlation analysis, it is correlated with Glucose and hence the
model will only chose one of them.

You can force insulin in to the model by starting with that and then
progressively adding other terms (but leaving out Glucose)
This gives the below classification (threshold reset to 0.5). so not as
good a model as with Glucose in but it may be a more logical model
with insulin instead of glucose (depends on the circumstances).

c. The variable “Diabetes Pedigree function” is a mathematical concept

based on heredity data. How well does this variable perform in
predicting diabetes in patients?

The variable does not do a good job predicting diabetes on its own and
is required to have other variables to help it predict.

3. Data: “candy.xls”.
a. Explore the Candy data in Excel and then import the data to SPSS.
Create appropriate labels in the Variable view from information in the
Excel file. Save as a .sav file.
b. Using the scale variables as independent variables, how well can each
of the candy types be predicted (one model per candy type). Interpret
the models statistically and also from a lay audience perspective
i. Chocolate
ii. fruity caramel
iii. peanutyalmondy
iv. nougat
v. crispedricewafer

Use the binary logistic regression function to create a model for each one of the above.
Some models are better than others! Sugar coeff is not statistically significant!?

Chocolate

Fruity
Caramel

This model is not doing anything to predict the candy.

Peanut almond
A very low level of prediction

Nougat

No level of prediction (I may as well just call everything not Nougat)

c. Either in SPSS syntax OR in Excel create a new variable with the

following properties
i. 0 if the product is NEITHER a chocolate or a fruity
ii. 1 if the product is a fruity
iii. 2 if the product is a chocolate
iv. 3 if the product is both chocolate and fruity
Use the remaining variables which you think are a sensible inclusion to
try and predict this multiple category variable. What conclusions do you
draw and how would you report his to management?
We need to use Multinomial Logistic Regression as we have more than 2 categories to
predict

Not a very good model using the category 3 as a base.

If we use the first category as a reference category the model does improve in term sof
interpretation
The classification does OK identifying only chocolate or only fruity but not both or none

4. Data: “spirits drinks.sav”

The data refers to a questionnaire distributed to 8000 respondents to
enquire about their choice of three spirits brands. The drink choice
indicates the drink they had chosen (Brand A, Brand B and Brand C).
The image statements refer to the drink and the respondent had to
state whether that particular statement was influential in why they
chose that brand (score of 1) or was of no relevance whether they
chose that brand (score of 0). The last variable refers to whether the
occasion was at home/friend’s home/etc. or whether it was Away from
home (e.g., in a bar/restaurant//café/etc). Each of the three brands are
owned by your organisation and the Marketing Director is asking your
advice on how the marketing spend should be directed in terms of
appealing to various aspects of each brand. The aspects in question
are the following.
High opinion
Seductive
Mysterious
I would drink it with friends
The Marketing Director needs direction of which image statements
should be associated with each brand.
a. By creating an appropriate model, advise the Marketing Director which
elements from their list above should be associated with each of the
brands and why. How would you describe the “fit/classification” of your
model statistically and what does that mean for the brand managers of
each of the brands (A, B and C)?

Here is the parameter estimates. They are all versus a base of “brand C”. The idea would be
to find the statements in questions and see if some of them are statistically higher (+ coef)
than brand C and statistically lower (- coef) than brand C (or vice versa). This gives you a
distinctive positioning you can use to market the brands under those circumstances.
On the other hand “seductive” is not statistically different for brand A compared to brand C
but it is for brans B compared to brand C (larger coef and significant p-value).
The same logic applies for checking the other statements.
For example both brand A and B are statistically lower coef than brand C which means that
brand C would be more appropriate for that statement.

b. The Marketing manager of brand A is also working on a pilot project

with some bars in the Bristol area to promote brand A. They are trying
to come up with the most important image statements which appeal to
that brand in this environment and are considering the following to
create the bar environment.
i. High opinion of the brand
ii. Focus on attractive packaging
iii. Make the atmosphere of the bar “daring and mysterious”
iv. Promote the brand being mixed with a Bristol based mixer drink
v. Charge a premium price as the consumers will be happy to pay
a higher price for the experience
Advise the Manager on what you would recommend. How confident
are you, statistically, of the advice and how would you communicate
this to the brand manager?

This is essentially the same exercise as above with one added step. That is
you must only select the data relating to away from home consumption. You
can do this is SPSS by selecting Data/Select Cases. Click on the If condition
satisfies option. Then select the HomeAway variable and move it to the top
box. Then set this “=1”. Click continue and then OK.
5. Load the solver Excel sheet from the lecture resources on Blackboard. Try
using the Diabetes data to build a model using solver as a form of estimation.
How do the results compare to question 2?

See the Excel sheet I have loaded up. You may very well have got a different
answer depending on constraints and initial starting values.

Lecture Notes - Logistic Regression
100% (1)
Lecture Notes - Logistic Regression
11 pages
Assignment Shreya Sec A
No ratings yet
Assignment Shreya Sec A
10 pages
7708 - MBA PredAnanBigDataNov21
No ratings yet
7708 - MBA PredAnanBigDataNov21
11 pages
AMA Assignment
No ratings yet
AMA Assignment
6 pages
Regresion Logistic - Odt 1
No ratings yet
Regresion Logistic - Odt 1
8 pages
Project 5 Surabhi Sood - Report
No ratings yet
Project 5 Surabhi Sood - Report
34 pages
A Short Guide to Marketing Model Alignment & Design: Advanced Topics in Goal Alignment - Model Formulation
From Everand
A Short Guide to Marketing Model Alignment & Design: Advanced Topics in Goal Alignment - Model Formulation
David Young
No ratings yet
Case Study - Healthcare Industry
No ratings yet
Case Study - Healthcare Industry
2 pages
Tutorial For Marketing
No ratings yet
Tutorial For Marketing
10 pages
Report- SVM
No ratings yet
Report- SVM
13 pages
Logistic Regression Notes
No ratings yet
Logistic Regression Notes
79 pages
Capstone Presentation Version 1.0
No ratings yet
Capstone Presentation Version 1.0
21 pages
Nanduri Naga Sowri Pgp-Dsba - Octa - G2 Great Learning
No ratings yet
Nanduri Naga Sowri Pgp-Dsba - Octa - G2 Great Learning
40 pages
MS Excel Instruction Steps in Matrimony Conjoint Analysis
No ratings yet
MS Excel Instruction Steps in Matrimony Conjoint Analysis
8 pages
Sakhil Capstone
No ratings yet
Sakhil Capstone
20 pages
Rstudio Study Notes For PA 20181126
No ratings yet
Rstudio Study Notes For PA 20181126
6 pages
Date Preparation and Exploration:: Titanic Data - CSV
No ratings yet
Date Preparation and Exploration:: Titanic Data - CSV
5 pages
G26_report
No ratings yet
G26_report
4 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
20 pages
IPL Winning Prediction Intern Report
No ratings yet
IPL Winning Prediction Intern Report
52 pages
BT4211 Data-Driven Marketing: Fundamentals: Process and Statistical Issues in Predictive Modeling
No ratings yet
BT4211 Data-Driven Marketing: Fundamentals: Process and Statistical Issues in Predictive Modeling
38 pages
Course Regression Model Strategies PDF
No ratings yet
Course Regression Model Strategies PDF
307 pages
M1 TECHNICAL
No ratings yet
M1 TECHNICAL
8 pages
DT 444
No ratings yet
DT 444
19 pages
Explanationdocx
No ratings yet
Explanationdocx
9 pages
APAN 5200_LinearRegression
No ratings yet
APAN 5200_LinearRegression
39 pages
Computer Lab 2 Block 1-3
No ratings yet
Computer Lab 2 Block 1-3
7 pages
Pima Tutorial
No ratings yet
Pima Tutorial
8 pages
Third Assessment-Business Analytics-2019-S1
No ratings yet
Third Assessment-Business Analytics-2019-S1
2 pages
Business Report: Predictive Modelling
100% (2)
Business Report: Predictive Modelling
37 pages
Modelling and Business Decisions
No ratings yet
Modelling and Business Decisions
41 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
RM Lab
No ratings yet
RM Lab
59 pages
Classifier Model For Diabetes Prediction
No ratings yet
Classifier Model For Diabetes Prediction
30 pages
FRA Milestone 1
No ratings yet
FRA Milestone 1
33 pages
Python For Data Sceince l1 Hands On
No ratings yet
Python For Data Sceince l1 Hands On
5 pages
Machine Learning Project: Choice of Employee Mode of Transport
No ratings yet
Machine Learning Project: Choice of Employee Mode of Transport
35 pages
Logistic Regression Lecture Notes
No ratings yet
Logistic Regression Lecture Notes
11 pages
Ek125 Final Project
No ratings yet
Ek125 Final Project
13 pages
Machine Learning Project On Cars
92% (13)
Machine Learning Project On Cars
22 pages
Logistic Regression Lecture Notes
No ratings yet
Logistic Regression Lecture Notes
11 pages
End-Term Exam (PGDM 2019-21), Term-V Introduction To R in Business Applications (Open Book and Online) Max. Marks - 40 Max. Time - 4 Hours
No ratings yet
End-Term Exam (PGDM 2019-21), Term-V Introduction To R in Business Applications (Open Book and Online) Max. Marks - 40 Max. Time - 4 Hours
2 pages
Machine Learning VIVEK
80% (5)
Machine Learning VIVEK
118 pages
bi5
No ratings yet
bi5
7 pages
Car Transport Machine Learning
89% (9)
Car Transport Machine Learning
28 pages
Case Segment-Characteristics
No ratings yet
Case Segment-Characteristics
4 pages
Advanced Analysis QuantileRegression
No ratings yet
Advanced Analysis QuantileRegression
45 pages
PG IV 1110 Online Predictive Modelling End Term Paper
No ratings yet
PG IV 1110 Online Predictive Modelling End Term Paper
3 pages
BM-1, Applied Statistics, Lesson 2: Comparing Two Groups (And One Group)
No ratings yet
BM-1, Applied Statistics, Lesson 2: Comparing Two Groups (And One Group)
39 pages
BES - R Lab 9
No ratings yet
BES - R Lab 9
7 pages
Assignment_STAT5002
No ratings yet
Assignment_STAT5002
5 pages
Regression - Toyo Updated
No ratings yet
Regression - Toyo Updated
9 pages
Report Logistic Regression
No ratings yet
Report Logistic Regression
17 pages
CAPESTONE PPT
No ratings yet
CAPESTONE PPT
16 pages
Data Science and Machine Learning Essentials: Lab 4B - Working With Classification Models
No ratings yet
Data Science and Machine Learning Essentials: Lab 4B - Working With Classification Models
29 pages
Sawtooth Software: Analysis of Traditional Conjoint Using Microsoft Excel: An Introductory Example
No ratings yet
Sawtooth Software: Analysis of Traditional Conjoint Using Microsoft Excel: An Introductory Example
7 pages
Fu Ch11 Linear Regression
No ratings yet
Fu Ch11 Linear Regression
70 pages
Module 7 Homework Prompt - JMP
No ratings yet
Module 7 Homework Prompt - JMP
6 pages
The Layperson’s MBA: Skipping the Degree without Skipping the Important Lessons
From Everand
The Layperson’s MBA: Skipping the Degree without Skipping the Important Lessons
Michelle Loucadoux
4/5 (1)
Effective Analytics for Marketing
From Everand
Effective Analytics for Marketing
Sucheta Kakkar
No ratings yet
Lease Agreement
100% (1)
Lease Agreement
3 pages
TEST 14: Problem Solving: Total
No ratings yet
TEST 14: Problem Solving: Total
1 page
Analysis and Valuation of Insurance Companies - Final
No ratings yet
Analysis and Valuation of Insurance Companies - Final
182 pages
Moody'SKMVproject Studentsample2
No ratings yet
Moody'SKMVproject Studentsample2
17 pages
Homework 2 New
No ratings yet
Homework 2 New
5 pages
L TEX Font Guide: Using A Font in Your Document
No ratings yet
L TEX Font Guide: Using A Font in Your Document
10 pages
Topic 6: Globalisation: 1. Outline The Price-Specie Flow Mechanism
No ratings yet
Topic 6: Globalisation: 1. Outline The Price-Specie Flow Mechanism
2 pages
Ec104 2014 Topic2 Notes
No ratings yet
Ec104 2014 Topic2 Notes
7 pages
10 - Merits of The Market System
No ratings yet
10 - Merits of The Market System
1 page
British Mathematical Olympiad Past Papers
33% (3)
British Mathematical Olympiad Past Papers
44 pages
UKMT Senior Maths Challenge 2013 Extended Solutions
No ratings yet
UKMT Senior Maths Challenge 2013 Extended Solutions
21 pages
Organic Synthesis
No ratings yet
Organic Synthesis
12 pages
MT-EDACS RAdio Types-0605
No ratings yet
MT-EDACS RAdio Types-0605
1 page
2 Job Rotation
No ratings yet
2 Job Rotation
4 pages
Report RCCL - Lavesh
No ratings yet
Report RCCL - Lavesh
21 pages
Duff Norton Screw Jacks Full Catalogue
No ratings yet
Duff Norton Screw Jacks Full Catalogue
154 pages
Tata Steel en 15804 Verified EPD Programme
No ratings yet
Tata Steel en 15804 Verified EPD Programme
16 pages
All CS MCQs
100% (2)
All CS MCQs
82 pages
Literature Review On Standard of Living
100% (2)
Literature Review On Standard of Living
7 pages
Feeld Brand Basics
No ratings yet
Feeld Brand Basics
11 pages
Structure of Indian Economy
No ratings yet
Structure of Indian Economy
18 pages
Commodities 2025 outlook
No ratings yet
Commodities 2025 outlook
40 pages
Project Report Electronic Piano PDF
No ratings yet
Project Report Electronic Piano PDF
4 pages
Led LCD TV: Service Manual
No ratings yet
Led LCD TV: Service Manual
41 pages
Non-Profit Government Organizations
No ratings yet
Non-Profit Government Organizations
49 pages
TECHNOLOGY ENG QP - Hlayiso - Com
No ratings yet
TECHNOLOGY ENG QP - Hlayiso - Com
14 pages
Subharmonic-Resonance Excitations
No ratings yet
Subharmonic-Resonance Excitations
18 pages
Security Measures
No ratings yet
Security Measures
10 pages
React Bits PDF
No ratings yet
React Bits PDF
126 pages
Vector8 Te2: Enset Ngine
No ratings yet
Vector8 Te2: Enset Ngine
2 pages
Changes during the Industrial Revolution in Britain
No ratings yet
Changes during the Industrial Revolution in Britain
3 pages
Judgment of The Court: 03rd & 15th May, 2024
No ratings yet
Judgment of The Court: 03rd & 15th May, 2024
24 pages
KVK Kapurthala's Newsletter
No ratings yet
KVK Kapurthala's Newsletter
18 pages
10 Science Notes 06 Life Processes 1
No ratings yet
10 Science Notes 06 Life Processes 1
12 pages
RDX Series Two-Way Radios User Guide Rdu2080d-Rdv2080d-Rdu4160d
No ratings yet
RDX Series Two-Way Radios User Guide Rdu2080d-Rdv2080d-Rdu4160d
104 pages
TECHNICAL REPORT
No ratings yet
TECHNICAL REPORT
14 pages
Materials Characterization Techniques
100% (1)
Materials Characterization Techniques
1 page
Journalism 10 Quarter 2 Melc 1
No ratings yet
Journalism 10 Quarter 2 Melc 1
12 pages
Es 312 Bosh
No ratings yet
Es 312 Bosh
1 page
LLB-Part-3103 Company Law Notes PDF 2019-20
No ratings yet
LLB-Part-3103 Company Law Notes PDF 2019-20
5 pages
Worksheet Money and Banking 2023
100% (4)
Worksheet Money and Banking 2023
6 pages
CV 260922
No ratings yet
CV 260922
1 page