Project On Statistical Methods For Decision Making: by Ameya Udapure

Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

Project on Statistical

Methods for Decision


Making
By
Ameya Udapure

1
DECLARATION

We certify that

a. The work contained in this project has been done by me under the guidance of our
supervision.
b. The work has not been submitted to any other Institute for any degree or diploma.
c. We have followed the guidelines provided by the Institute in preparing the project report.
d. We have confirmed to the norms and guidelines given in the Ethical Code of Conduct of
the Institute.
e. Whenever we have used materials (data, theoretical analysis, figures, and text) from other
sources, we have given due credit to them by citing them in the text of the report and giving their
details in the references. Further, I have taken permission from the copyright owners of the
sources, whenever necessary.

Name of Students Signature of


Students

Ameya Udapure
------------------------

2
Table of Content

Contents

Executive Summary………………………………………………………………………………..6
Introduction………………………………………………………………………………………...6
Problem 1…………………………………………………………………………………………..7
1.1 Use methods of descriptive statistics to summarize data. Which Region and which Channel
spent the most? Which Region and which Channel spent the least?...........................................7

1.2 There are 6 different varieties of items that are considered. Describe and comment/explain all
the varieties across Region and Channel? Provide a detailed justification for your answer….9

1.3 On the basis of a descriptive measure of variability, which item shows the most inconsistent
behavior? Which items show the least inconsistent behavior? ………………………………12

1.4 Are there any outliers in the data? Back up your answer with a suitable plot/technique with the
help of detailed comments…………………………………………………………………..13

1.5 On the basis of your analysis, what are your recommendations for the business? How can your
analysis help the business to solve its problem? Answer from the business perspective……….15

Problem………………………………………………………………………………………….16
2.1. For this data, construct the following contingency tables (Keep Gender as row variable)...16
2.1.1. Gender and Major……………………………………………………………………….16
2.1.2. Gender and Grad Intention……………………………………………………………...16
2.1.3. Gender and Employment………………………………………………………………..16
2.1.4. Gender and Computer…………………………………………………………………...16
2.2. Assume that the sample is representative of the population of CMSU. Based on the data,
answer the following question…………………………………………………………………..17
2.2.1. What is the probability that a randomly selected CMSU student will be male?..............17
2.2.2. What is the probability that a randomly selected CMSU student will be female?...........17
2.3. Assume that the sample is representative of the population of CMSU. Based on the data,
answer the following question…………………………………………………………………..18
2.3.1. Find the conditional probability of different majors among the male students in CMSU..18

3
2.3.2 Find the conditional probability of different majors among the female students of CMSU…19
2.4. Assume that the sample is a representative of the population of CMSU. Based on the data,
answer the following question…………………………………………………………………...19
2.4.1. Find the probability that a randomly chosen student is a male and intends to graduate,,,,19
2.4.2 Find the probability that a randomly selected student is a female and does NOT have a
laptop……………………………………………………………………………………………..20

2.5. Assume that the sample is representative of the population of CMSU. Based on the data,
answer the following question……………………………………………………………………20
2.5.1. Find the probability that a randomly chosen student is a male or has full-time
employment?..................................................................................................................................20
2.5.2. Find the conditional probability that given a female student is randomly chosen, she is
majoring in international business or management……………………………………………….20
2.6. Construct a contingency table of Gender and Intent to Graduate at 2 levels (Yes/No). The
Undecided students are not considered now and the table is a 2x2 table. Do you think the graduate
intention and being female are independent events?.......................................................................21
2.7. Note that there are four numerical (continuous) variables in the data set, GPA, Salary,
Spending, and Text Messages……………………………………………………………………22
2.7.1. If a student is chosen randomly, what is the probability that his/her GPA is less than 3?......22
2.7.2. Find the conditional probability that a randomly selected male earns 50 or more. Find the
conditional probability that a randomly selected female earns 50 or more………………………22
2.8. Note that there are four numerical (continuous) variables in the data set, GPA, Salary,
Spending, and Text Messages. For each of them comment whether they follow a normal
distribution. Write a note summarizing your conclusions………………………………………..23
Problem 3………………………………………………………………………………………...25
3.1 Do you think there is evidence that mean moisture contents in both types of shingles are within
the permissible limits? State your conclusions clearly showing all steps………………………..25
3.2 Do you think that the population mean for shingles A and B are equal? Form the hypothesis
and conduct the test of the hypothesis. What assumption do you need to check before the test for
equality of means is performed?.....................................................................................................28
Please reflect on all that you have learnt while working on this project. This step is critical in
cementing all your concepts and closing the loop. Please write down your thoughts

4
List of Figures
Figure 1 Hotel and Retail
Figure 2 Region and Channel
Figure 3 Box plot of Milk
Figure 4 Box plot of grocery
Figure 5 Box plot of Frozen
Figure 6 Box plot of Detergent Paper
Figure 7 Box plot of Deli
Figure 8 GPA
Figure 9 Salary
Figure 10 Spending
Figure 11 Text Message
Figure 12 P- Value of A
Figure 13 P- Value of B
Figure 14 P- Value of Joint

List of Tables
Table No 1 Descriptive statistics
Table No 2 Fresh product
Table No 3 Milk product
Table No 4 Frozen Project
Table No 5 Grocery product
Table No 6 Detergents Paper
Table No 7 Delicatessen
Table No 8 Co-efficient of variance
Table No 9 Gender and Major
Table No 10 Gender and Grad Intension
Table No 11 Gender and Employment
Table No 12 Gender and computer
Table No 13 Gender and Major
Table No 14 Gender and Employment
Table No 15 Gender and Major
Table No 16 Gender and Grade Intension
Table No 17 Gender and Salary

5
Executive Summary
Problem1:
This intermediate level data set has 440 rows and 9 columns. The data set refers to clients of a
wholesale distributor. It includes the annual spending in monetary units on diverse product
categories. This data set is recommended for learning and practicing your skills in exploratory data
analysis, data visualization.

Problem 2:
This intermediate level data set has 62 rows and 14 columns. The data set refers to about the
undergraduate students that attend CMSU. It includes the Id, gender, age, class, major, grad intent,
GPA, employment, salary, social networking, satisfaction, spending, computer, text messages.
This data set is recommended for learning and practicing your skills in exploratory data analysis,
data visualization. CMSU creates and distributes a survey of 14 questions and receives responses
from 62 undergraduates

Problem 3:
The customer fills that they have purchased a product lacking in quality if they find moisture and
wet shingles inside the packaging. In some cases, excessive moisture can cause the granules
attached to the shingles for texture and coloring purposes to fall off the shingles resulting in
appearance problems. Our work is to show that the mean moisture content is less than 0.35 pounds
per 100 square feet by hypothesis testing

Introduction

The purpose of this whole exercise is to explore the dataset. Do the exploratory data analysis.
Explore the dataset using central tendency and other parameters. Analyze the different attributes
of the car make which can help in analyzing the price of the car. This assignment should help the
us in exploring the summary of statistics, contingency tables, conditional probabilities &
hypothesis testing.

6
Problem 1:
A wholesale distributor operating in different regions of Portugal has information on annual
spending of several items in their stores across different regions and channels. The data consists
of 440 large retailers’ annual spending on 6 different varieties of products in 3 different regions
(Lisbon, Oporto, Other) and across different sales channel (Hotel, Retail).
1.1 Use methods of descriptive statistics to summarize data. Which Region and which Channel
spent the most? Which Region and which Channel spent the least?

Describe function in the python helps to get descriptive statistic of the data which tells about
characteristics of data .The descriptive statistics shows measure of central tendency and measure
of variability i.e. Such as mean, median, mode, standard deviation, min value, max value IQR,
25%, 50%, and 75%.

Table No.1 Descriptive statistics

From the above descriptive statistics we can see that there are 440 rows and 9 columns. The
dataset represent the clients of a wholesaler. It includes the spending of the client on the various
product categories.

Description of the variables:

Buyer/Spender: It shows the ID of the customer (Continuous).


Channel: Customer channel hotel and retail (Nominal).
Region: From which customer belong (Nominal).
Fresh: Spending on Fresh products (Continuous).
Milk: Spending on Milk products (Continuous).
Grocery: Spending on Grocery products (Continuous).
Frozen: Spending on Frozen products (Continuous).
Detergent paper: Spending on Detergent paper products (Continuous).
Delicatessen: Spending on Delicatessen products (Continuous).

From the describe function in the python we can get the characteristics of the data i.e. measure of
the central of tendency and measure variability i.e. mean, median , mode , max ,min, standard
deviation, 25 %,50 % and 75% for the variables.
7
Figure No 1 Hotel and Retail

Highest spent in the region is from other and lowest spent in the region is from Oporto.
Highest spent in the channel is from hotel and lowest spent in the channel is from Retail.
To further infer across the product expenditure spread is shown in the below Bar graph

Figure 2 Region and Channel


We can analysis that Fresh product in other region at Hotel channel incurred the highest
expenditure and Detergent Paper product in Oporto region at Retail channel incurred the lowest
expenditure.

8
1.2 There are 6 different varieties of items that are considered. Describe and comment/explain all
the varieties across Region and Channel? Provide a detailed justification for your answer.

Solution:

a) For fresh product.

Table No.2 Fresh product


From above Table:
1. The average of Fresh is highest in Hotel channel of other region, while mean is low in
retail channel of Lisbon region.
2. The standard deviation in Hotel channel of other region is the highest which seems less
reliable and lowest in retail channel of Lisbon region with more reliability.
b) For milk product.

Table No.3 Milk product

9
From the above Table
1. The mean Milk expenditure is highest in Retail channel of other region, while mean is
lowest in hotel channel of other oporto.
2. The standard deviation in retail channel of other region is the highest which seems less
reliable and lowest in hotel channel of oporto region with more reliability.
c) For frozen product

Table No.4 Frozen product


From the above Table
d) The average frozen expenditure is highest in hotel channel of oporto region, while
average is lowest in retail channel of other region.
e) The standard deviation in hotel channel of Lisbon region is the highest which seems less
reliable and lowest in retail channel of other region with more reliability.
f) For Grocery product:

Table No.5 Grocery product

10
From the above Table
1. The average Grocery expenditure is highest in retail channel of lisbon region, while
average is lowest in hotel channel of other region.
2. The standard deviation in retail channel of oporto region is the highest which seems less
reliable and lowest in hotel channel of oporto region with more reliability.
g) For detergent paper:

Table No.6 detergent paper


From the above Table
1. The mean detergents paper expenditure is highest in retail channel of lisbon region, while
mean is lowest in hotel channel of other region.
2. The standard deviation in retail channel of oporto region is the highest which seems less
reliable and lowest in hotel channel of oporto region with more reliability.

h) For delicatessen product:

Table No.7 Delicatessen product

11
From the above Table
1. The mean Delicatessen expenditure is highest in retail channel of oporto region, while
average is lowest in hotel channel of lisbon region.
2. The standard deviation in hotel channel of other region is the highest which seems less
reliable and lowest in hotel channel of oporto region with more reliability.
1.3 On the basis of a descriptive measure of variability, which item shows the most inconsistent
behavior? Which items show the least inconsistent behavior?

Solution:

Standard deviation:

A standard deviation (or σ) is a measure of how dispersed the data is in relation to the mean.
Low standard deviation means data are clustered around the mean, and high standard deviation
indicates data are more spread out.

Coefficient of Variation
Coefficient of variation (CV) is defined the ratio of standard deviation to mean
CV= σ/ μ

σ: Standard déviation
μ: Mean

Table No 8 Coefficient of variation

12
From the above Table we can calculate Co-efficient of Variance
1. CV of fresh = Standard deviation of fresh / Mean of fresh
= 12647.32 / 12000.29
= 1.05
2. CV of milk = Standard deviation of milk / Mean of milk
= 7380.37 / 5796.26
= 1.27
3. CV of grocery = Standard deviation of grocery / Mean of grocery
= 9503.16 / 7951.27
= 1.19
4. CV of frozen = Standard deviation of frozen / Mean of frozen
= 4854.67 / 3071.93
= 1.58
5. CV of detergent paper = Standard deviation of detergent paper / Mean of detergent paper
= 4767.85 / 2881.49
= 1.65
6. CV of Delicatessen = Standard deviation of Delicatessen / Delicatessen of milk
= 2820.10 / 1524.8
= 1.84

1.4 Are there any outliers in the data? Back up your answer with a suitable plot/technique with the
help of detailed comments.
Solution:

13
Figure No -3 Box plot of Milk

Figure No -4 Box plot of Grocery

Figure No -5 Box plot of Frozen

Figure No -6 Box plot of detergent paper

14
Figure No -7 Box plot of Delicatessen

From all the above box plot we can say there are outliers present in fresh, milk, grocery, frozen,
Detergent paper and Delicatessen

1.5. On the basis of your analysis, what are your recommendations for the business? How can your
analysis help the business to solve its problem? Answer from the business perspective
Solution:
Highest spending in the region is from other and lowest spent in the region is from Oporto he need
to focus on minimizing the expenditure in Other Region and big it down to lisbon and oporto
Highest spending in the channel is from hotel and lowest spent in the channel is from Retail.
We can analysis that Fresh product in other region at hotel channel incurred the highest
expenditure and detergent Paper product in Oporto region at retail channel incurred the lowest
expenditure.
Also there are outliers are present in FRESH,MILK,FROZEN,GROCERY,DETERGENT
PAPER, and DELICATESSEN

15
Problem 2:
The Student News Service at Clear Mountain State University (CMSU) has decided to gather data
about the undergraduate students that attend CMSU. CMSU creates and distributes a survey of 14
questions and receives responses from 62 undergraduates (stored in the Survey data set).

2.1. For this data, construct the following contingency tables (Keep Gender as row variable)
2.1.1. Gender and Major
Solution:

Table No 9 Gender and Major


2.1.2. Gender and Grad Intention
Solution:

Table No 10 Gender and Grad Intention

2.1.3. Gender and Employment


Solution:

Table No 11 Gender and Employment


2.1.4. Gender and Computer
Solution:

Table No 12 Gender and Computer

16
2.2. Assume that the sample is representative of the population of CMSU. Based on the data,
answer the following question:

Probability:
Probability is a measure of the likelihood of an event to occur. Many events cannot be predicted
with total certainty. We can predict only the chance of an event to occur i.e. how likely they are to
happen, using it. Probability can range in from 0 to 1, where 0 means the event to be an impossible
one and 1 indicates a certain event. The probability of all the events in a sample space adds up to
1.

Formula for Probability:


The probability formula is defined as the possibility of an event to happen is equal to the ratio of
the number of favorable outcomes and the total number of outcomes .
Probability of event to happen P (E) = Number of favorable outcomes / Total Number of outcomes

2.2.1. What is the probability that a randomly selected CMSU student will be male?
Solution:

Probability of event to happen P (Male) = Number of Male / Total Number of outcomes


= 33 / 62 *100
= 46.77 %

Probability that a randomly selected candidate will be male: 46.77 %

2.2.2. What is the probability that a randomly selected CMSU student will be female?
Solution:

Probability of event to happen P (Female) = Number of Female / Total Number of outcomes


= 29 / 62 *100
= 53.22 %
Probability that a randomly selected candidate will be Female: 53.22 %

17
2.3. Assume that the sample is representative of the population of CMSU. Based on the data,
answer the following question:

Table No 13 Gender and Major

2.3.1. Find the conditional probability of different majors among the male students in CMSU.
Solution:

Among MALE candidates:

Probability of Accounting = 4 / 29
= 0.1379 or 13.79%

Probability of CIS = 1 / 29
= 0.03448 or 3.44%

Probability of Economic/Finance = 4 / 29
= 0.1379 or 13.79%

Probability of International Business = 2 / 29


= 0.06896 or 6.896%

Probability of Management = 6 / 29
= 0.2068 or 20.68%

Probability of other = 4 / 29
= 0.1379 or 13.79%

Probability of Retail/Marketing = 5 / 29
= 0.1724 or 17.24%

Probability of Undecided = 3 / 29
= 0.1034 or 10.34%

18
2.3.2 Find the conditional probability of different majors among the female students of CMSU.
Solution:

Among FEMALE candidates:

Probability of Accounting = 3 / 33
= 0.0909 or 9.09%

Probability of CIS = 3 / 33
= 0.0909 or 9.09%

Probability of Economic/Finance = 7 / 33
= 0.2121 or 21.21%

Probability of International Business = 4 / 33


= 0.1212 or 12.12%

Probability of Management = 4 / 33
= 0.1212 or 12.12 %

Probability of other = 3 / 33
= 0.0909 or 9.09%

Probability of Retail/Marketing = 9 / 33
= 0.2727 or 27.27%

Probability of Undecided = 0 / 33
= 0.00 or 0%

2.4. Assume that the sample is a representative of the population of CMSU. Based on the data,
answer the following question:

2.4.1. Find the probability that a randomly chosen student is a male and intends to graduate.
Solution:

P (Student should be male and intended to graduate) = P (Student should be male and intended to
graduate) x P (Male) = (17 / 29) * (29 / 62)
= 0.2741 or 27.41%

Student should be male and intended to graduate are 0.2841 or 27.41%

19
2.4.2 Find the probability that a randomly selected student is a female and does NOT have a
laptop.
Solution:

P (Female who does not have laptop) = P (Female who does not have laptop) x P (Female)
= (4 / 33) * (33 / 62)
= 0.064 or 6.45%

Female who does not have laptop are 0.064 or 6.45%

2.5. Assume that the sample is representative of the population of CMSU. Based on the data,
answer the following question:
2.5.1. Find the probability that a randomly chosen student is a male or has full-time employment?
Solution:

Table No 14 Gender and Employment

P (Student is male and is full time employment) = P (Student is male and is full time employment)
x P (Male)
= (7 / 29) * (29 / 62)
= 0.1129 or 11.29%

Student is male and is full time employment is 0.1129 or 11.29%

2.5.2. Find the conditional probability that given a female student is randomly chosen, she is
majoring in international business or management.
Solution:

Table No 15 Gender and Major

P (Student is Female and has major in international business or management)


= P (Student is Female and has major in international business or management) x P (Female)
= (8 / 33) * (33 / 62))
= 0.1290 or 12.90%

Student is Female and has major in international business or management are 0.1290 or 12.90%
20
2.6. Construct a contingency table of Gender and Intent to Graduate at 2 levels (Yes/No). The
Undecided students are not considered now and the table is a 2x2 table. Do you think the graduate
intention and being female are independent events?
Solution:

Table No 16 Gender and Grad intention

For 2 events to be independent, following condition is to be satisfied

P (A ∩ B) = P (A) * P (B)

So, P (Graduate intention ∩ Female) = P (Graduate intention) * P (Female)

P (Female) = 20 / 40 = 0.5

P (Graduate intention) = 28 / 40 = 0.7

P (Graduate intention) * P (Female) = 0.5× 0.7 = 0.35

P (Graduate intention ∩ Female) = 11 / 40 = 0.275

P (Graduate intention ∩ Female) != P (Graduate intention) * P (Female)

This is not independent events as probability multiplication of both events is not equal to combined
event, so being a Graduate intention and being female candidate are not independent events.

21
2.7. Note that there are four numerical (continuous) variables in the data set, GPA, Salary,
Spending, and Text Messages.
Answer the following questions based on the data

2.7.1. If a student is chosen randomly, what is the probability that his/her GPA is less than 3?
Solution:

Table No 16 Gender and GPA

Probability that his/her GPA is less than 3 = (17 / 62)


= 0.2741 or 27.41%

Probability that his/her GPA is less than 3 is 0.2741 or 27.41%.

2.7.2. Find the conditional probability that a randomly selected male earns 50 or more. Find the
conditional probability that a randomly selected female earns 50 or more.
Solution:

Table No 17 Gender and Salary

P (Randomly selected male earns 50 or more)


= P (Randomly selected male earns 50 or more) x P (Male)
= (14 / 29) * (29 / 62)
= 0.2258 or 22.58%

Randomly selected male earns 50 or more are 0.2258 or 22.58%.

P (Randomly selected female earns 50 or more)


= P (Randomly selected Female earns 50 or more) x P (Female)
= (13 / 33) * (33 / 62)
= 0.2096 or 20.96 %

Randomly selected female earns 50 or more are 0.2096 or 20.96%

22
2.8. Note that there are four numerical (continuous) variables in the data set, GPA, Salary,
Spending, and Text Messages. For each of them comment whether they follow a normal
distribution. Write a note summarizing your conclusions.
Solution:

Figure No 8 GPA
From the above plot we can say it is normally distributed as 68 % data lies in the 1 +sigma and 1-
sigma.

Figure No 9 Salary

From the above plot we can say it is normally distributed as 68 % data lies in the 1 +sigma and 1-
sigma.

23
Figure No 10 Spending

From the above plot we can analyze the data is skewed toward right which indicates that there are
outliers.

Figure No 11 Text Message

From the above plot we can analyze the data is skewed toward right which indicates that there are
outliers.

24
Problem 3:
An important quality characteristic used by the manufacturers of ABC asphalt shingles is the
amount of moisture the shingles contain when they are packaged. Customers may feel that they
have purchased a product lacking in quality if they find moisture and wet shingles inside the
packaging. In some cases, excessive moisture can cause the granules attached to the shingles for
texture and coloring purposes to fall off the shingles resulting in appearance problems. To monitor
the amount of moisture present, the company conducts moisture tests. A shingle is weighed and
then dried. The shingle is then reweighed, and based on the amount of moisture taken out of the
product, the pounds of moisture per 100 square feet are calculated. The company would like to
show that the mean moisture content is less than 0.35 pounds per 100 square feet.
The file (A & B shingles.csv) includes 36 measurements (in pounds per 100 square feet) for A
shingles and 31 for B shingles.

3.1 Do you think there is evidence that means moisture contents in both types of shingles are within
the permissible limits? State your conclusions clearly showing all steps.
Solution:

A)

Step 1: State the null and alternate hypothesis

H0: Mean moisture content is equal to 0.35 pounds per 100 square feet in sample A

Ha: Mean moisture content is less than 0.35 pounds per 100 square feet in sample A

Step 2: Decide the significance level

Alpha = 0.05 or 5%

Step 3: Identify the test Statistic

We don’t know the population standard deviation and n=36 in A so we cannot perform Z- test, we
to use 1-sample T test. The formula for T test is

25
Step 4: Finding the P value using python

Figure No 12 P value
P Value is 0.07
Step 5: Comparing P value with Alpha
P Value > Alpha or 0.05 or 5%
Therefore we accept the null hypothesis

Step 6: Conclusion of Test


Basis the hypothesis test performed on the given sample of 36 observation at 95 % confidence
there is not enough evidence to conclude the mean moisture content in sample A is less the 0.35
per 100 square feet, therefore accept the Null hypothesis.

26
B)
Solution:

Step 1: State the null and alternate hypothesis

H0: Mean moisture content is equal to 0.35 pounds per 100 square feet in sample B

Ha: Mean moisture content is less than 0.35 pounds per 100 square feet in sample B

Step 2: Decide the significance level

Alpha = 0.05 or 5%

Step 3: Identify the test Statistic

We don’t know the population standard deviation and n=30 in B so we cannot perform Z- test, we
to use 1-sample T test. The formula for T test is

Step 4: Finding the P value using python

Figure No 13 P value

P value is 0.002
Step 5: Comparing P value with Alpha
As P Value < Alpha or 0.05 or 5%
Therefore we reject the null hypothesis
Step 6: Conclusion of Test
Basis the hypothesis test performed on the given sample of 31 observation at 95 % confidence
there is enough evidence to conclude the mean moisture content in sample B is less the 0.35 per
100 square feet, the probability of observing a sample of 31 shingles that will result in the sample
mean moisture content of 0.35 pounds per 100 square feet or less is 0.002.

27
3.2 Do you think that the population mean for shingles A and B are equal? Form the hypothesis
and conduct the test of the hypothesis. What assumption do you need to check before the test for
equality of means is performed?

Solution:

Step 1: State the null and alternate hypothesis

H0: Mean moisture content is equal to 0.35 pounds per 100 square feet in A is equal to Mean
moisture content is equal to 0.35 pounds per 100 square feet in B

Ha: Mean moisture content is equal to 0.35 pounds per 100 square feet in A is not equal to Mean
moisture content is equal to 0.35 pounds per 100 square feet in B

To perform Hypothesis Testing, the following assumptions must hold,

1. The variables must follow continuous distribution


2. The sample must be randomly collected from the population
3. The underlying distribution must be normal. Alternatively, if the data is continuous, but
may not be assumed to follow a normal distribution, a reasonably large sample size is
required. CLT asserts that sample mean follows a normal distribution, even if the
population distribution is not normal, when sample size is at least 30.
4. For 2 sample t-test, the population variances of 2 distributions must be equal.

Step 2: Decide the significance level

Alpha = 0.05 or 5%

Step 3: Identify the test Statistic

We have two samples and we not know population standard deviation. Sample size for both the
samples are n=36 in A and n= 31 in B. The samples are greater than 30, so use t distribution and
T stats test statistic for two sample unpaired test.

Step 4: Finding the P value using python


We use the scipy.stats.ttest_ind to calculate the t-test for the means of TWO INDEPENDENT
samples of scores given the two sample observations. This function returns t statistic and two-
tailed p value.

This is a two-sided test for the null hypothesis that 2 independent samples have identical average
(expected) values. This test assumes that the populations have identical variances

28
Figure No 12 P-value

P value is 0.202

Step 5: Comparing P value with Alpha


As P Value > Alpha or 0.05 or 5%
Therefore we accept the null hypothesis

Step 6: Conclusion of Test


Basis the hypothesis test performed on the given sample of observation at 95 % confidence there
is not enough evidence to conclude that the population mean of shingles A and B are not equal so
we to accept the null hypothesis i.e. mean moisture content is equal to 0.35 pounds per 100 square
feet in A is equal to mean moisture content is equal to 0.35 pounds per 100 square feet in B. Basis
the assumption the distribution of two population are normal and the variance of the distribution
is same.

29
THANK YOU

30
31
32

You might also like