Assignment 2 - EPGCOM-10-006

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 18

EPGCOM-10 Course: Six Sigma

Submitted by:
Nilay Singh Thakur, EPGCOM-10-029
Akash Suryavanshi, EPGCOM - 10 - 003
Anuj Kumar, EPGCOM – 10 - 006
Assignment - 2
Question 1: Two catalysts are being analyzed to determine how they affect the mean yield of
a chemical process. Specifically, catalyst 1 is currently in use, but catalyst 2 is acceptable.
Since catalyst 2 is cheaper, it should be adopted, providing it does not change the process
yield. An experiment is run in the pilot plant and the results are in the data shown below. Is
there any difference between the mean yields? Assume equal variance. Also construct a box
plot for the yield data and do the normality checks. Provide your interpretation

Observation
Catalyst 1 Catalyst 2
Number
1 95.8 89.19
2 90.2 90.95
3 94.2 90.46
4 95.2 93
5 91.79 97.19
6 88.07 97.04
7 93.72 91.07
8 87.3 92.5

Solution 1: We assume that X (Catalyst 1) and Y (Catalyst 2) have normal distributions N (μX,
sigma^2X) and N (μY, sigma^2Y), respectively.

(a) Test H0: μX = μY against H1: μX not equal to μY at alpha = 0.05. Assume that sigma^2X =
sigma^2Y
.
(b) Test H0: sigma^2X = sigma^2Y against H1: sigma^2X not equal to sigma^2Y with significance
level alpha = 0.05

Variable N N Mean SE Mean StDev Minimum Q1 Median Q3 Maximum


*
Catalyst-1 8 0 92.03 1.14 3.24 87.30 88.60 92.75 94.95 95.80
Catalyst-2 8 0 92.67 1.05 2.98 89.19 90.58 91.78 96.03 97.19

From the given data, we find that ¯x = 92.03, sx = 3.23, ¯y = 92.67, sy = 2.98.

Difference is 92.67-92.03= 0.64


Sp = Sqrt(7*(3.23)^2+7*(2.98)^2)/14

Sp = 3.10
So the value of the test statistics is

t = 92.035 – 92.675 / 3.1 sqrt (1/8+1/8)

t = - 0.412
I t I = 0.412 < t a/2 (n+m-2) = t0.025 (14) = 2.145, we do not reject H0 at a = 0.05.

b) Since S^2x / S^2y = (3.23) ^2/ (2.98) ^2 = 1.17 < Fa/2 (n-1, m-1) = F0.025 (7, 7) = 4.99

We do not reject H0: sigma^2X = sigma^2Y


Question 2: Management of a soft drink bottling company wants to develop a method for
allocating delivery costs to the customers. Although one cost clearly related to travel time
within a particular route, another variable cost reflects the time required to unload the cases
of soft drink at the delivery point. A sample of 20 deliveries within a delivery was selected.
The delivery times and the number of cases delivered are given below.

a) Compute regression coefficients along with their 95% confidence limits and 95% prediction
limits line. Provide their meaning.

b) Predict the delivery time of 150 cases?

c) Should you use the model to predict the delivery time for a customer who is receiving 600
cases of soft drinks? Why or Why not?

d) Determine r^2 and r^2(adj). What is their meaning?

Solution 2: Scattered Plot


A)

Regression model:

Regression Equation
Delivery = 26.05
Time + 0.13019 Number of cases

Coefficients
Term Coef SE Coef T-Value P-Value VIF
Constant 26.05 1.19 21.84 0.000  
Number of 0.13019 0.00624 20.86 0.000 1.00
cases

Model Summary
S R-sq R-sq(adj) R-sq(pred)
2.3308 96.03% 95.81% 95.13%
4

Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Regression 1 2364.01 2364.01 435.14 0.000
  Number of 1 2364.01 2364.01 435.14 0.000
cases
Error 18 97.79 5.43    
Total 19 2461.80      

Fits and Diagnostics for Unusual Observations


Delivery Std
Obs Time Fit Resid Resid
13 57.200 52.219 4.981 2.20 R
R  Large residual
Bo= 26.05

B1= 0.1309

Regression equation:

Y= 26.05+0.1309X number of cases

Bo is the theoretical time it would take to make a delivery consisting of zero cases of
beverages. B1 is the expected incremental increase in the delivery time for each
additional case of beverages.

b) Predict the average delivery time for a customer who is receiving 150 cases.

Y= 26.05+0.1309X1

Y= 26.05 + 0.1309x 150

= 45.685 min.

C) No, because the data used to create the model did not include any order of more
than 340 cases.

D) R-square= 1-(SSE/TSS)

1- (97.79/2461.80)

= 1-0.039

= 0.9602

So, here 96.02% of variation is delivery time can be explained by variation of number of
cases of delivered.

Coefficient of correlation:

R= Sqrt(R2)

= Sqrt (0.9602x0.9602)

= 0.9798
Question 3: The data contains prices for two tickets (in $), with online service charges, large
popcorn, and two medium soft drinks at a sample of six theatre chains. 38.25, 35, 35, 40, 33,
41

a) At the 0.05 level of significance, is there evidence that mean price for two tickets, is
different from $35?

b) Determine the p-value and interpret its meaning

c) What assumption about the population distribution is needed in (a) and (b)

d) Do you think the assumption stated in (c) is seriously violated?

Solution 3:

a)

By the 1-Sample t-test mean price for 2 tickets is different from $35, It is 37.04 at 95% CI or .05
level of significance.

b)

p-value is 0.177, It mean that the Null hypothesis is true as 0.177> 0.05

c)

We assumed in a & b that the data is normally distributed.

Question 4: In a study on the effectiveness of the synthetic automobile fuels, two factors are
of importance. Factor A is an additive that is to be tested at two levels and factor B is a
catalyst for which 2 levels has to be tested. Twenty automobiles are randomly selected for
the study and each of 4 treatments is randomly used in five different automobiles. The
efficiency ratings in percentages are given below.
Setup DoE and analyze using Minitab software.

i) What do you mean by coded and un-coded analysis?

ii) Write the transfer function

iii) Explain the terms such as effect, p-value and Seq SS

iv) What are your interpretations of graphs?

v) What factors and levels you consider as the best to maximize efficiency?

Solution 4:

a)

Coded and Un coded Analysis:

When both factor level of Adhesive and catalyst Level 1 and Level 2 will be considered as

Low Level and High Level and as + 1 and –1 in coded unit.


Un Coded Factor:

SL no Additive catalyst Efficiency


1 Level-1 Level-2 64
2 Level-2 Level-1 50
3 Level-1 Level-1 72
4 Level-1 Level-2 62
5 Level-1 Level-1 70
6 Level-2 Level-1 58
7 Level-2 Level-2 74
8 Level-2 Level-2 70
9 Level-1 Level-2 59
10 Level-2 Level-2 c68
11 Level-2 Level-1 46
12 Level-2 Level-2 68
13 Level-1 Level-2 50
14 Level-2 Level-1 53
15 Level-1 Level-1 65
16 Level-1 Level-1 75
17 Level-1 Level-2 68
18 Level-2 Level-1 56
19 Level-1 Level-1 67
20 Level-2 Level-2 70

Coded Factor:

SL no Aditive catalyst Efficiency


1 -1 1 64
2 1 -1 50
3 -1 -1 72
4 -1 1 62
5 -1 -1 70
6 1 -1 58
7 1 1 74
8 1 1 70
9 -1 1 59
10 1 1 68
11 1 -1 46
12 1 1 68
13 -1 1 50
14 1 -1 53
15 -1 -1 65
16 -1 -1 75
17 -1 1 68
18 1 -1 56
19 -1 -1 67
20 1 1 70
(76.05+84.05+884.45)/1405.75

=0.7473

So, here 74.35% of the variation in the experiment is accounted for the model, only 25.70% is
the error, this model describes the data as well.
Here, in Linear, additive and catalyst P value is >0.05 that indicates there is sufficient statistical
evidences were both catalyst and adhesive have no significance effect individually main effect is
not significant.

Here in 2-way interactions, additive catalyst, P value is <0.05, indicate there is sufficient
statically evidence that both catalyst and adhesive has impact in combination interaction effect
is statistically significant.

Main effect plot:

Additive level to be at -1 Level- 1 and catalyst to be at + 1 or Level -2 to maximize the efficiency

Interaction effect plot:

when catalyst is kept fixed at Level 1 and additive is changed from Level 1 to Level 2 ten
efficiency reduced from high to Low Level.

when catalyst is kept fixed at Level 2 and additive is changed from Level 1 to Level 2 then
efficiency increases from Low to high Level
Question 5: The following data represent the nationwide highest yield of different types of
Current Deposits (CD)

a) At the 0.05 level of significance, is there evidence of a difference in the means yield of
different account?

b) State the hypothesis used here.

c) What do you mean by pooled standard deviation?

Solution 5:
Data describe means yields of different accounts.

b)

Hypothesis Used here:

Null Hypothesis All means are equal

Alternate Hypothesis Not all means are equal

c)

Pooled SD:

The Pooled Standard Deviation is a weighted average of Standard deviations for two or more
groups. The individual standard deviations are averaged, with more “weight” given to larger
sample sizes.

Once the pooled standard deviation has been calculated, SD pooled is used in place of SD 1 and
SD2 in the formula for standard error. Along with an updated degrees of freedom formula (df =
n1 + n2 – 2),

For this question, Pooled SD is 0.0711056

Question 6: The following is a set of data from a sample of n=11 items. Get all descriptive
statistics from Minitab. 21, 15, 24, 5, 28, 30, 34, 12, 27, 45, 54

What are the 95% confidence limits for the mean? Write the interpretation.
What are the meanings of "skewness" and "kurtosis"?

Do you believe the underlying distribution for these data is normal?

Solution 6: Below is the Descriptive analysis for the provided data

Statistics
Variabl N N* Mean SE StDev Minimum Q1 Median Q3 Maximum
e Mean
Samples 11 0 26.82 4.27 14.18 5.00 15.00 27.00 34.00 54.00

N Mean StDev SE Mean 95% CI for μ


11 26.82 14.18 4.27 (17.29, 36.34)
μ: mean of Samples

So here, accordance with the 95% of CI limit means that 95% of the data will lie in between
17.29 to 36.34

Comparison Chart of Skewness and Kurtosis:

BASIS FOR
SKEWNESS KURTOSIS
COMPARISON

Meaning Skewness refers the tendency of a Kurtosis means the measure of the
distribution that determines its respective sharpness of the curve, in
symmetry about the mean. the frequency distribution.

Measure for Degree of lopsidedness in the Degree of tailedness in the


distribution. distribution.

What is it? It is an indicator of lack of It is the measure of data, which is


equivalence in the frequency either peaked or flat in relation to
distribution. the normal distribution.

Represents Amount and direction of the skew. How tall and sharp the central peak
is?
Skewness and kurtosis of given sample data

Mea SE
Variable N N* n Mean StDev Minimum Q1 Median Q3 Maximum Skewness
Samples 1 0 26.82 4.27 14.18 5.00 15.0 27.00 34.0 54.00 0.47
1 0 0

Variable Kurtosis
Samples 0.16

Data can be considered is normal with these evidences:


Statistics
Variable Mean Median Skewness Kurtosis
Samples 26.82 27.00 0.47 0.16

1. P values of 1 mean we accept null hypothesis as true. (From Graph)


2. There is not much difference in mean and median.
3. There is Kurtosis which is near to zero, which shows the data symmetric
4. Skewness is also close to Zero, which shows data is symmetrical.

So, we can able to tell that data is evenly distributed and symmetric.

Question 7: The following defects were observed in a day’s production of a ceramic


manufacturing company. Carry out Pareto analysis using Minitab.

Solution 7: Below is the Pareto’s Analysis for the provided data


We can interpret from above pareto analysis that scratches and chips can cause 80% of
defects in ceramics manufacturing.

You might also like