Assignment 2 - EPGCOM-10-006

EPGCOM-10 Course: Six Sigma
Submitted by:
Nilay Singh Thakur, EPGCOM-10-029
Akash Suryavanshi, EPGCOM - 10 - 003
Anuj Kumar, EPGCOM – 10 - 006
Assignment - 2
Question 1: Two catalysts are being analyzed to determine how they affect the mean yield of
a chemical process. Specifically, catalyst 1 is currently in use, but catalyst 2 is acceptable.
Since catalyst 2 is cheaper, it should be adopted, providing it does not change the process
yield. An experiment is run in the pilot plant and the results are in the data shown below. Is
there any difference between the mean yields? Assume equal variance. Also construct a box
plot for the yield data and do the normality checks. Provide your interpretation
Observation
Catalyst 1 Catalyst 2
Number
1 95.8 89.19
2 90.2 90.95
3 94.2 90.46
4 95.2 93
5 91.79 97.19
6 88.07 97.04
7 93.72 91.07
8 87.3 92.5
Solution 1: We assume that X (Catalyst 1) and Y (Catalyst 2) have normal distributions N (μX,
sigma^2X) and N (μY, sigma^2Y), respectively.
(a) Test H0: μX = μY against H1: μX not equal to μY at alpha = 0.05. Assume that sigma^2X =
sigma^2Y
.
(b) Test H0: sigma^2X = sigma^2Y against H1: sigma^2X not equal to sigma^2Y with significance
level alpha = 0.05
Variable N N Mean SE Mean StDev Minimum Q1 Median Q3 Maximum

*
Catalyst-1 8 0 92.03 1.14 3.24 87.30 88.60 92.75 94.95 95.80
Catalyst-2 8 0 92.67 1.05 2.98 89.19 90.58 91.78 96.03 97.19
From the given data, we find that ¯x = 92.03, sx = 3.23, ¯y = 92.67, sy = 2.98.
Difference is 92.67-92.03= 0.64

Sp = Sqrt(7*(3.23)^2+7*(2.98)^2)/14
Sp = 3.10
So the value of the test statistics is
t = 92.035 – 92.675 / 3.1 sqrt (1/8+1/8)
t = - 0.412
I t I = 0.412 < t a/2 (n+m-2) = t0.025 (14) = 2.145, we do not reject H0 at a = 0.05.
b) Since S^2x / S^2y = (3.23) ^2/ (2.98) ^2 = 1.17 < Fa/2 (n-1, m-1) = F0.025 (7, 7) = 4.99
We do not reject H0: sigma^2X = sigma^2Y

Question 2: Management of a soft drink bottling company wants to develop a method for
allocating delivery costs to the customers. Although one cost clearly related to travel time
within a particular route, another variable cost reflects the time required to unload the cases
of soft drink at the delivery point. A sample of 20 deliveries within a delivery was selected.
The delivery times and the number of cases delivered are given below.
a) Compute regression coefficients along with their 95% confidence limits and 95% prediction
limits line. Provide their meaning.
b) Predict the delivery time of 150 cases?
c) Should you use the model to predict the delivery time for a customer who is receiving 600
cases of soft drinks? Why or Why not?
d) Determine r^2 and r^2(adj). What is their meaning?
Solution 2: Scattered Plot

A)
Regression model:
Regression Equation
Delivery = 26.05
Time + 0.13019 Number of cases
Coefficients
Term Coef SE Coef T-Value P-Value VIF
Constant 26.05 1.19 21.84 0.000
Number of 0.13019 0.00624 20.86 0.000 1.00
cases
Model Summary
S R-sq R-sq(adj) R-sq(pred)
2.3308 96.03% 95.81% 95.13%
4
Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Regression 1 2364.01 2364.01 435.14 0.000
Number of 1 2364.01 2364.01 435.14 0.000
cases
Error 18 97.79 5.43
Total 19 2461.80
Fits and Diagnostics for Unusual Observations

Delivery Std
Obs Time Fit Resid Resid
13 57.200 52.219 4.981 2.20 R
R Large residual
Bo= 26.05
B1= 0.1309
Regression equation:
Y= 26.05+0.1309X number of cases
Bo is the theoretical time it would take to make a delivery consisting of zero cases of
beverages. B1 is the expected incremental increase in the delivery time for each
additional case of beverages.
b) Predict the average delivery time for a customer who is receiving 150 cases.
Y= 26.05+0.1309X1
Y= 26.05 + 0.1309x 150
= 45.685 min.
C) No, because the data used to create the model did not include any order of more
than 340 cases.
D) R-square= 1-(SSE/TSS)
1- (97.79/2461.80)
= 1-0.039
= 0.9602
So, here 96.02% of variation is delivery time can be explained by variation of number of
cases of delivered.
Coefficient of correlation:
R= Sqrt(R2)
= Sqrt (0.9602x0.9602)
= 0.9798
Question 3: The data contains prices for two tickets (in $), with online service charges, large
popcorn, and two medium soft drinks at a sample of six theatre chains. 38.25, 35, 35, 40, 33,
41
a) At the 0.05 level of significance, is there evidence that mean price for two tickets, is
different from $35?
b) Determine the p-value and interpret its meaning
c) What assumption about the population distribution is needed in (a) and (b)
d) Do you think the assumption stated in (c) is seriously violated?
Solution 3:
a)
By the 1-Sample t-test mean price for 2 tickets is different from $35, It is 37.04 at 95% CI or .05
level of significance.
b)
p-value is 0.177, It mean that the Null hypothesis is true as 0.177> 0.05
c)
We assumed in a & b that the data is normally distributed.
Question 4: In a study on the effectiveness of the synthetic automobile fuels, two factors are
of importance. Factor A is an additive that is to be tested at two levels and factor B is a
catalyst for which 2 levels has to be tested. Twenty automobiles are randomly selected for
the study and each of 4 treatments is randomly used in five different automobiles. The
efficiency ratings in percentages are given below.
Setup DoE and analyze using Minitab software.
i) What do you mean by coded and un-coded analysis?
ii) Write the transfer function
iii) Explain the terms such as effect, p-value and Seq SS
iv) What are your interpretations of graphs?
v) What factors and levels you consider as the best to maximize efficiency?
Solution 4:
a)
Coded and Un coded Analysis:
When both factor level of Adhesive and catalyst Level 1 and Level 2 will be considered as
Low Level and High Level and as + 1 and –1 in coded unit.

Un Coded Factor:
SL no Additive catalyst Efficiency

1 Level-1 Level-2 64
10 Level-2 Level-2 c68
Coded Factor:
SL no Aditive catalyst Efficiency

1 -1 1 64
2 1 -1 50
3 -1 -1 72
4 -1 1 62
5 -1 -1 70
6 1 -1 58
7 1 1 74
8 1 1 70
9 -1 1 59
10 1 1 68
11 1 -1 46
12 1 1 68
13 -1 1 50
14 1 -1 53
15 -1 -1 65
16 -1 -1 75
17 -1 1 68
18 1 -1 56
19 -1 -1 67
20 1 1 70
(76.05+84.05+884.45)/1405.75
=0.7473
So, here 74.35% of the variation in the experiment is accounted for the model, only 25.70% is
the error, this model describes the data as well.
Here, in Linear, additive and catalyst P value is >0.05 that indicates there is sufficient statistical
evidences were both catalyst and adhesive have no significance effect individually main effect is
not significant.
Here in 2-way interactions, additive catalyst, P value is <0.05, indicate there is sufficient
statically evidence that both catalyst and adhesive has impact in combination interaction effect
is statistically significant.
Main effect plot:
Additive level to be at -1 Level- 1 and catalyst to be at + 1 or Level -2 to maximize the efficiency
Interaction effect plot:
when catalyst is kept fixed at Level 1 and additive is changed from Level 1 to Level 2 ten
efficiency reduced from high to Low Level.
when catalyst is kept fixed at Level 2 and additive is changed from Level 1 to Level 2 then
efficiency increases from Low to high Level
Question 5: The following data represent the nationwide highest yield of different types of
Current Deposits (CD)
a) At the 0.05 level of significance, is there evidence of a difference in the means yield of
different account?
b) State the hypothesis used here.
c) What do you mean by pooled standard deviation?
Solution 5:
Data describe means yields of different accounts.
b)
Hypothesis Used here:
Null Hypothesis All means are equal
Alternate Hypothesis Not all means are equal
c)
Pooled SD:
The Pooled Standard Deviation is a weighted average of Standard deviations for two or more
groups. The individual standard deviations are averaged, with more “weight” given to larger
sample sizes.
Once the pooled standard deviation has been calculated, SD pooled is used in place of SD 1 and
SD2 in the formula for standard error. Along with an updated degrees of freedom formula (df =
n1 + n2 – 2),
For this question, Pooled SD is 0.0711056
Question 6: The following is a set of data from a sample of n=11 items. Get all descriptive
statistics from Minitab. 21, 15, 24, 5, 28, 30, 34, 12, 27, 45, 54
What are the 95% confidence limits for the mean? Write the interpretation.
What are the meanings of "skewness" and "kurtosis"?
Do you believe the underlying distribution for these data is normal?
Solution 6: Below is the Descriptive analysis for the provided data
Statistics
Variabl N N* Mean SE StDev Minimum Q1 Median Q3 Maximum
e Mean
Samples 11 0 26.82 4.27 14.18 5.00 15.00 27.00 34.00 54.00
N Mean StDev SE Mean 95% CI for μ

11 26.82 14.18 4.27 (17.29, 36.34)
μ: mean of Samples
So here, accordance with the 95% of CI limit means that 95% of the data will lie in between
17.29 to 36.34
Comparison Chart of Skewness and Kurtosis:
BASIS FOR
SKEWNESS KURTOSIS
COMPARISON
Meaning Skewness refers the tendency of a Kurtosis means the measure of the
distribution that determines its respective sharpness of the curve, in
symmetry about the mean. the frequency distribution.
Measure for Degree of lopsidedness in the Degree of tailedness in the

distribution. distribution.
What is it? It is an indicator of lack of It is the measure of data, which is

equivalence in the frequency either peaked or flat in relation to
distribution. the normal distribution.
Represents Amount and direction of the skew. How tall and sharp the central peak
is?
Skewness and kurtosis of given sample data
Mea SE
Variable N N* n Mean StDev Minimum Q1 Median Q3 Maximum Skewness
Samples 1 0 26.82 4.27 14.18 5.00 15.0 27.00 34.0 54.00 0.47
1 0 0
Variable Kurtosis
Samples 0.16
Data can be considered is normal with these evidences:

Statistics
Variable Mean Median Skewness Kurtosis
Samples 26.82 27.00 0.47 0.16
1. P values of 1 mean we accept null hypothesis as true. (From Graph)

2. There is not much difference in mean and median.
3. There is Kurtosis which is near to zero, which shows the data symmetric
4. Skewness is also close to Zero, which shows data is symmetrical.
So, we can able to tell that data is evenly distributed and symmetric.
Question 7: The following defects were observed in a day’s production of a ceramic

manufacturing company. Carry out Pareto analysis using Minitab.
Solution 7: Below is the Pareto’s Analysis for the provided data

We can interpret from above pareto analysis that scratches and chips can cause 80% of
defects in ceramics manufacturing.

Assignment 2 - EPGCOM-10-006

Uploaded by

Copyright:

Available Formats

Assignment 2 - EPGCOM-10-006

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Assignment 2 - EPGCOM-10-006

Uploaded by

Copyright:

Available Formats

EPGCOM-10 Course: Six Sigma

Variable N N Mean SE Mean StDev Minimum Q1 Median Q3 Maximum

Difference is 92.67-92.03= 0.64

t = 92.035 – 92.675 / 3.1 sqrt (1/8+1/8)

We do not reject H0: sigma^2X = sigma^2Y

b) Predict the delivery time of 150 cases?

d) Determine r^2 and r^2(adj). What is their meaning?

Solution 2: Scattered Plot

Fits and Diagnostics for Unusual Observations

Y= 26.05+0.1309X number of cases

Y= 26.05 + 0.1309x 150

b) Determine the p-value and interpret its meaning

d) Do you think the assumption stated in (c) is seriously violated?

We assumed in a & b that the data is normally distributed.

i) What do you mean by coded and un-coded analysis?

ii) Write the transfer function

iii) Explain the terms such as effect, p-value and Seq SS

iv) What are your interpretations of graphs?

Coded and Un coded Analysis:

Low Level and High Level and as + 1 and –1 in coded unit.

SL no Additive catalyst Efficiency

SL no Aditive catalyst Efficiency

Main effect plot:

Additive level to be at -1 Level- 1 and catalyst to be at + 1 or Level -2 to maximize the efficiency

Interaction effect plot:

b) State the hypothesis used here.

c) What do you mean by pooled standard deviation?

Hypothesis Used here:

Null Hypothesis All means are equal

Alternate Hypothesis Not all means are equal

For this question, Pooled SD is 0.0711056

Do you believe the underlying distribution for these data is normal?

Solution 6: Below is the Descriptive analysis for the provided data

N Mean StDev SE Mean 95% CI for μ

Comparison Chart of Skewness and Kurtosis:

Measure for Degree of lopsidedness in the Degree of tailedness in the

What is it? It is an indicator of lack of It is the measure of data, which is

Data can be considered is normal with these evidences:

1. P values of 1 mean we accept null hypothesis as true. (From Graph)

Question 7: The following defects were observed in a day’s production of a ceramic

Solution 7: Below is the Pareto’s Analysis for the provided data

You might also like