Assignment 2 - EPGCOM-10-006
Assignment 2 - EPGCOM-10-006
Assignment 2 - EPGCOM-10-006
Submitted by:
Nilay Singh Thakur, EPGCOM-10-029
Akash Suryavanshi, EPGCOM - 10 - 003
Anuj Kumar, EPGCOM – 10 - 006
Assignment - 2
Question 1: Two catalysts are being analyzed to determine how they affect the mean yield of
a chemical process. Specifically, catalyst 1 is currently in use, but catalyst 2 is acceptable.
Since catalyst 2 is cheaper, it should be adopted, providing it does not change the process
yield. An experiment is run in the pilot plant and the results are in the data shown below. Is
there any difference between the mean yields? Assume equal variance. Also construct a box
plot for the yield data and do the normality checks. Provide your interpretation
Observation
Catalyst 1 Catalyst 2
Number
1 95.8 89.19
2 90.2 90.95
3 94.2 90.46
4 95.2 93
5 91.79 97.19
6 88.07 97.04
7 93.72 91.07
8 87.3 92.5
Solution 1: We assume that X (Catalyst 1) and Y (Catalyst 2) have normal distributions N (μX,
sigma^2X) and N (μY, sigma^2Y), respectively.
(a) Test H0: μX = μY against H1: μX not equal to μY at alpha = 0.05. Assume that sigma^2X =
sigma^2Y
.
(b) Test H0: sigma^2X = sigma^2Y against H1: sigma^2X not equal to sigma^2Y with significance
level alpha = 0.05
From the given data, we find that ¯x = 92.03, sx = 3.23, ¯y = 92.67, sy = 2.98.
Sp = 3.10
So the value of the test statistics is
t = - 0.412
I t I = 0.412 < t a/2 (n+m-2) = t0.025 (14) = 2.145, we do not reject H0 at a = 0.05.
b) Since S^2x / S^2y = (3.23) ^2/ (2.98) ^2 = 1.17 < Fa/2 (n-1, m-1) = F0.025 (7, 7) = 4.99
a) Compute regression coefficients along with their 95% confidence limits and 95% prediction
limits line. Provide their meaning.
c) Should you use the model to predict the delivery time for a customer who is receiving 600
cases of soft drinks? Why or Why not?
Regression model:
Regression Equation
Delivery = 26.05
Time + 0.13019 Number of cases
Coefficients
Term Coef SE Coef T-Value P-Value VIF
Constant 26.05 1.19 21.84 0.000
Number of 0.13019 0.00624 20.86 0.000 1.00
cases
Model Summary
S R-sq R-sq(adj) R-sq(pred)
2.3308 96.03% 95.81% 95.13%
4
Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Regression 1 2364.01 2364.01 435.14 0.000
Number of 1 2364.01 2364.01 435.14 0.000
cases
Error 18 97.79 5.43
Total 19 2461.80
B1= 0.1309
Regression equation:
Bo is the theoretical time it would take to make a delivery consisting of zero cases of
beverages. B1 is the expected incremental increase in the delivery time for each
additional case of beverages.
b) Predict the average delivery time for a customer who is receiving 150 cases.
Y= 26.05+0.1309X1
= 45.685 min.
C) No, because the data used to create the model did not include any order of more
than 340 cases.
D) R-square= 1-(SSE/TSS)
1- (97.79/2461.80)
= 1-0.039
= 0.9602
So, here 96.02% of variation is delivery time can be explained by variation of number of
cases of delivered.
Coefficient of correlation:
R= Sqrt(R2)
= Sqrt (0.9602x0.9602)
= 0.9798
Question 3: The data contains prices for two tickets (in $), with online service charges, large
popcorn, and two medium soft drinks at a sample of six theatre chains. 38.25, 35, 35, 40, 33,
41
a) At the 0.05 level of significance, is there evidence that mean price for two tickets, is
different from $35?
c) What assumption about the population distribution is needed in (a) and (b)
Solution 3:
a)
By the 1-Sample t-test mean price for 2 tickets is different from $35, It is 37.04 at 95% CI or .05
level of significance.
b)
p-value is 0.177, It mean that the Null hypothesis is true as 0.177> 0.05
c)
Question 4: In a study on the effectiveness of the synthetic automobile fuels, two factors are
of importance. Factor A is an additive that is to be tested at two levels and factor B is a
catalyst for which 2 levels has to be tested. Twenty automobiles are randomly selected for
the study and each of 4 treatments is randomly used in five different automobiles. The
efficiency ratings in percentages are given below.
Setup DoE and analyze using Minitab software.
v) What factors and levels you consider as the best to maximize efficiency?
Solution 4:
a)
When both factor level of Adhesive and catalyst Level 1 and Level 2 will be considered as
Coded Factor:
=0.7473
So, here 74.35% of the variation in the experiment is accounted for the model, only 25.70% is
the error, this model describes the data as well.
Here, in Linear, additive and catalyst P value is >0.05 that indicates there is sufficient statistical
evidences were both catalyst and adhesive have no significance effect individually main effect is
not significant.
Here in 2-way interactions, additive catalyst, P value is <0.05, indicate there is sufficient
statically evidence that both catalyst and adhesive has impact in combination interaction effect
is statistically significant.
when catalyst is kept fixed at Level 1 and additive is changed from Level 1 to Level 2 ten
efficiency reduced from high to Low Level.
when catalyst is kept fixed at Level 2 and additive is changed from Level 1 to Level 2 then
efficiency increases from Low to high Level
Question 5: The following data represent the nationwide highest yield of different types of
Current Deposits (CD)
a) At the 0.05 level of significance, is there evidence of a difference in the means yield of
different account?
Solution 5:
Data describe means yields of different accounts.
b)
c)
Pooled SD:
The Pooled Standard Deviation is a weighted average of Standard deviations for two or more
groups. The individual standard deviations are averaged, with more “weight” given to larger
sample sizes.
Once the pooled standard deviation has been calculated, SD pooled is used in place of SD 1 and
SD2 in the formula for standard error. Along with an updated degrees of freedom formula (df =
n1 + n2 – 2),
Question 6: The following is a set of data from a sample of n=11 items. Get all descriptive
statistics from Minitab. 21, 15, 24, 5, 28, 30, 34, 12, 27, 45, 54
What are the 95% confidence limits for the mean? Write the interpretation.
What are the meanings of "skewness" and "kurtosis"?
Statistics
Variabl N N* Mean SE StDev Minimum Q1 Median Q3 Maximum
e Mean
Samples 11 0 26.82 4.27 14.18 5.00 15.00 27.00 34.00 54.00
So here, accordance with the 95% of CI limit means that 95% of the data will lie in between
17.29 to 36.34
BASIS FOR
SKEWNESS KURTOSIS
COMPARISON
Meaning Skewness refers the tendency of a Kurtosis means the measure of the
distribution that determines its respective sharpness of the curve, in
symmetry about the mean. the frequency distribution.
Represents Amount and direction of the skew. How tall and sharp the central peak
is?
Skewness and kurtosis of given sample data
Mea SE
Variable N N* n Mean StDev Minimum Q1 Median Q3 Maximum Skewness
Samples 1 0 26.82 4.27 14.18 5.00 15.0 27.00 34.0 54.00 0.47
1 0 0
Variable Kurtosis
Samples 0.16
So, we can able to tell that data is evenly distributed and symmetric.