Six Sigma Green Belt 3.ANALYSE (IASSC)

Six Sigma
www.invenislearning.com
3.0 Analyze Phase
3.1 Patterns of Variation
4
Lean Tools - Value Add (VA) and Non-Value Add
(NVA) Analysis
• The objective of the VA/NVA analysis is to:
• Identify and eliminate the hidden costs that do not add value to the customer
• Reduce unnecessary process complexity, and thus errors
• Method:
• Classify each process step as value-added (also known as "customer value-add"), business non-value-add
(sometimes called "required waste"), and non-value-add
• Add up the time spent in each category
• Decide what to do next.
• Value-add tasks should be optimized and standardized
• Business non-value-add tasks should be checked with the customer and, where possible, minimized or
eliminated
• Non-value-add activities should be eliminated
www.invensislearning.com
5
(NVA) Analysis
• Value-Added or Customer Value –Added
• Must be performed to meet customer needs
• Adds form or feature to the service
• Enhances service quality, enables on-time or more competitive delivery or has a positive impact on
price competition
• Customers would be willing to pay for this work if they knew you were doing it
• Non-Value-Added
• Rework, Duplicating, waiting, etc.
• Business Value-Added
• Internal Requirements i.e. compliance
6
(NVA) Analysis
• Lead Time
• The time between order and delivery
• Cycle Time C/T
• The time taken at each step to create a
product/service element
• Takt Time
• Customer demand rate
• Process Time P/T
• The time taken to produce one item when
one operator is working on a product at a
time – it equals C/T (in case of batch
processing C/T = (P/T) / no. of items
produced
7
(NVA) Analysis
Value stream pinpoints value add and non value add activities
Production Sales
Suppliers Customer
Forecasts Demand
Forecasts Forecasts
Subassembly Final Assembly Test Stage Ship
I I I I I
Components
4 weeks
4 days 3 days 5 days 10 days 42 days
92 minutes
20 min 42 min 10 min 15 min 5 min
8
(NVA) Analysis
High
variation
Production Sales
Suppliers Customer
Forecasts Demand
High
defect
rate
Forecasts Forecasts
Excessive
inventory
Subassembly Final Assembly Test Stage Ship
I I I I I
Components Long set
4 weeks
up times 42 days
4 days 3 days 5 days 10 days
92 minutes
120 min 42 min 10 min 15 min 5 min
9
(NVA) Analysis
What steps can I modify to deliver an improved process to my customers
Eliminate Stop doing the process step entirely
Combine Flow by eliminating the wait/inventory between 2 steps
Control inventory between steps at a fixed level and only

Pull produce to that level
Separate from the critical path Perform steps in parallel
Improve the performance/predictability of a highly variable

Mitigate the impact (inventory/error rate etc.) process
10
Takt Time
• Takt Time Calculation
• The takt time is the amount of available work time divided by the customer demand during that time period
• Example:
• Work Schedule: 8 hours/day = Total of 480 minutes in a day
• No. of shipments to handle in a day = 150
• Takt time = 480 (minutes)/150 = One shipment for every 3.2 minutes
• Any VA step in a process map that takes longer than the Takt rate is considered a time trap
• Divide the total time for the process by Takt time to get a rough estimate the staff requires to operate the process
11
3.1.1 Multi-Vari Analysis
Multi-Vari Studies
Multi-Vari studies analyze variation, investigate process stability, identify investigation areas, and
break down the variation.
They classify variation sources into three major types:
Positional Cyclical Temporal

Variations within a single unit Variations among sequential Variations which occur over
where variation is due to repetitions over a short time. longer periods of time.
location.
Examples: Every n’th pallet Examples: Process drift,
Examples: Pallet stacking in a broken, batch-to-batch performance before and after
truck, temperature gradient in variation, lot-to-lot variation, breaks, seasonal and shift based
an oven, the variation observed invoices received day-to-day and differences, month-to-month
from cavity-to-cavity within a account activity week-to-week closings, and quarterly returns
mold, a region of a country, the
line on the invoice
Continued on next slide

12
• Use Multi-Vari Chart as a preliminary tool to investigate variation in your data, including cyclical
variations and interactions between factors.
• A multi-vari chart provides a graphical representation of the relationships between factors and a
response.
• The multi-vari chart displays the means at each factor level for every factor. In Minitab, each multi-vari
chart can display up to four factors.
• For Example, a manufacturer produces plastic pipes using two different machines with three
temperature settings. The quality engineer is concerned about the consistency of pipe diameters from
the different machines and settings. The engineer creates a multi-vari chart to investigate the variation in
pipe diameters.
13
Example of Multi-Vari Chart

An engineer wants to assess the effect of sintering time on the compressive strength of three different metals. The
engineer measures the compressive strength of five specimens of each metal type at each sintering time: 100 minutes,
150 minutes, and 200 minutes.
• The engineer creates a multi-vari chart to look for possible trends and interactions in the data.
• Open the sample data, SinteringTime.MTW.
• Choose Stat > Quality Tools > Multi-Vari Chart.
• In Response, enter Strength.
• In Factor 1, enter SinterTime.
• In Factor 2, enter MetalType.
• Click OK.
Interpret the results
• The multi-vari chart indicates a possible interaction between the type of metal and the length of sintering time. The
greatest compressive strength for Metal Type 1 is obtained by sintering for 100 minutes, for Metal Type 2 by sintering
for 150 minutes, and for Metal Type 3 by sintering for 200 minutes.
14
Multi-Vari Chart Example

The data shows that the strength varies differently across sintering times for different metal types, indicating
an interaction.
15
Create Multi-Vari Chart

The five steps to create a Multi-Vari chart are:
Select Process and Decide Sample Create a Link the

Plot the Chart
Characteristics Size Tabulation Sheet Observed Values
Example: Example: Example: Example: Example:

Select the process Sample size is five pieces The tabulation sheet Chart is plotted The observed
where the plate is from each equipment with data records with time on the X values are linked
being manufactured and the frequency of contains the columns axis and the plate by appropriate
and measure its data collection is every with time, equipment thickness on the Y lines.
thickness within a two hours. number, and axis.
specified range. thickness as headers.
16
The path to create a Multi-Vari chart in Minitab is:

Minitab > Stat > Quality Tools > Multi-Vari Chart
17
3.1.2 Classes of Distributions
The data obtained from the measurement phase exhibits a variety of distribution, depending on the data type
and its source.
The methods used to describe the parameters for classes of distribution are:
Probability Statistics Inferential Statistics

• It is based on an assumed • Uses the measured data to • Describe the population
model of distribution. determine a model to parameters based on the
• Used to find the chances of describe the data used. sample data using a particular
certain outcome/event to model.
occur.
18
Types of Distributions
The two types of distribution are as follows:
Discrete Distribution Continuous Distribution
• Binomial distribution • Normal distribution

• Poisson distribution • Chi-square distribution
• t-distribution
• F-distribution
19
Binomial Distribution
The binomial distribution is a probability distribution for discrete data.
Characteristics of
Predicts sample behavior Describes the discrete data as a
result of a particular process
Best suitable when the sample size is

Used to deal with defective items less than thirty and less than ten
percent of the population
P(R) = n Cr ∗ pr ∗ 1 − pnr ( ) -
where, P(R) = probability of exactly (r) successes out of a sample size of (n)
p = probability of success; r = number of successes desired; n = sample size
20
Some of the key calculations of binomial distribution are shown.
Term Formula
𝜇 = 𝑛𝑝
Mean where, n = sample size
p = probability of success
𝜎 = 𝑛𝑝(1 − 𝑝)
Standard Deviation where, n = sample size
p = probability of success
Sample factorial calculation 5! = 5 ∗ 4 ∗ 3 ∗ 2 ∗ 1 = 120

4! = 4 ∗ 3 ∗ 2 ∗ 1 = 24
21
Calculating Binomial Distribution - Example

A sample of size six is randomly selected from a batch of 14.28% nonconforming. Find the probability that the
sample has exactly two nonconforming units.
From the problem statement we know that
n=6
x=2
p = 0.1428
Filling in the numbers, we have
P(X=2) = 6!
(0.1428)2(0.8572)6-2
2!(6-2)!
= 720
(0.02014)(0.5399)
(2)(24)
= (15)(0.0204)(0.5399)
= 0.1651
Thus, the probability that the sample contains exactly two nonconforming units is 0.1651.
22
Poisson Distribution
Poisson distribution is an application of the population knowledge to predict the sample behaviour.
Describes the discrete data
Used to analyze situations wherein the

number of trials is large
Characteristics of
Deals with integers which can take any value
Poisson Distribution
Used where the probability of success in
each trial is very small
Used for predicting the number of defects
23
Poisson Distribution - Formula

The formula for the Poisson distribution is as follows:
λx ∗ e−λ
P (∗) =
∗!
where, P(x) = probability of exactly (∗) occurrences in a Poisson distribution (n)
λ = mean number of occurrences during interval
∗ = number of occurrences desired
e = base of the natural logarithm (equals 2.71828)
Mean of a Poisson Distribution (µ) = λ

Standard Deviation of a Poisson Distribution (σ) = λ
24
Calculating Poisson Distribution – Example

The number of defects per shift has a Poisson distribution with λ = 4.2. Find the probability that
the second shift produces fewer than two defects. Therefore, we will seek
P(x<2) = P(x=0) + P(x=1)
Filling in the numbers, we have
-4.2 0
P(X=0) = e 4.2 = 0.015
0!
e-4.2 4.21
P(X=1) = = 0.063
1!
P(X<2) = 0.078
25
Continuous Probability Distribution

The continuous probability distribution is characterized by the probability density function.
• A variable is said to be continuous if the range of possible values falls along a continuum.
Example: Loudness of cheering at a ball game, the weight of cookies in a package, length
of a pen, or the time required to assemble a car.
• These distributions help in predicting the sample behaviour observed in a population.
26
Normal Distribution
The Normal or Gaussian distribution is a continuous probability
distribution, illustrated as N (µ, σ).
• It has a higher frequency of values around the mean and
fewer occurrences away from it.
• It is used as a first approximation to describe real-valued
random variables that tend to cluster around a single mean
value.
• It is a bell-shaped curve and is symmetrical. Normal Distribution with Mean = 100 and Standard Deviation = 10
• The total area under the normal curve p(x which is found in
the distribution) = 1.
27
In a normal distribution, to standardize comparisons of dispersion, a standard Z variable is

utilized. The uses of Z value are as follows:
• It is unique for each probability within the normal distribution.
• It helps in finding probabilities of data points anywhere within the distribution.
• It is dimensionless with no units like mm, litres, coulombs, etc.
Z =
(Y − µ)
σ
where Z = number of standard deviations between Y and the µ
Y = value of the data point in concern
µ = mean of the population
σ = standard deviation of the population
28
Q Suppose the time taken to resolve customer problems follows a normal distribution with the mean
value of 250 hours and standard deviation value of 23 hrs. What is the probability that a problem
resolution will take more than 300 hrs?
A Given:
● Y = 300
● µ = 250
● σ = 23
(300−250)
Using the formula: Z = =2.17
23
● From a Normal Distribution Table, the Z value of 2.17 covers an area of 0.98499 under itself
● Thus, the probability that a problem can be resolved in less than 300 hrs is 98.5%
● The chances of a problem resolution taking more than 300 hours is 1.5%
29
Q Suppose the time taken to resolve customer problems follows a normal distribution with the mean
value of 250 hours and standard deviation value of 23 hrs. What is the probability that a problem
resolution will take more than 300 hrs?
A Given:
● Y = 300
● µ = 250
● σ = 23
(300−250)
Using the formula: Z = =2.17
23
● From a Normal Distribution Table, the Z value of 2.17 covers an area of 0.98499 under itself
● Thus, the probability that a problem can be resolved in less than 300 hrs is 98.5%
● The chances of a problem resolution taking more than 300 hours is 1.5%
30
Chi-Square Distribution
If we obtain a random sample X1, X2, …., Xn of size n from a population that is normally distributed with
mean µ a with finite variance σ2, the random variable
(n-1)s2
x2 =
σ2
is distributed as a chi-square distribution with n-1 degrees of freedom where s2 is the sample variance.
The formula for the x2 will be useful later when we discuss hypothesis testing and confidence intervals.
31
T-Distribution
A t-distribution is most appropriate to be used when:
• The sample size <30;
• Population standard deviation is not known; and
• Population is approximately normal.
The t-distribution approaches normality as the sample size increases.
32
F-Distribution
The F-distribution is a ratio of two Chi-square distributions, and a specific F-distribution is denoted by the
degrees of freedom for the numerator Chi-square and the degrees of freedom for the denominator Chi-square.
S1
Fcalculated =
S2
where S1 and S2 = standard deviations of the two samples

● If Fcalculated is 1, there is no difference in the variance
● If S1> S2, then the numerator should be greater than the denominator (df1 = n1 – 1 and df2 = n2 – 1)
Refer F-table to find out critical F-distribution at α and degrees of freedom of samples of two
different processes (df1 and df2)
3.2 Inferential Statistics
34
3.2.1 Understanding Inference
Types of Statistics
Statistics refers to the science of collection, analysis, interpretation, and presentation of data. There are two
major types of statistics-Descriptive statistics and Inferential statistics.
Descriptive Statistics Inferential Statistics
• Also known as Enumerative statistics • Also known as Analytical statistics

• Includes organizing, summarizing, and • Includes predicting and drawing
presenting the data conclusions
• Describes what's going on in the data • Makes inferences from our data to
• Histograms, pie charts, box plots, etc., more general conditions
are the tools • Hypothesis testing, scattered diagram,
etc., are the tools
35
3.2.1 Understanding Inference
Inferential statistics is a set of methods used to draw conclusions or inferences about characteristics of
populations based on data from a sample. The mean calculated for a population. The standard deviation
calculated for a population. The objective of statistical inference is to draw conclusions about population
characteristics based on the information contained in a sample
Statistical inference in a practical situation contains two elements:

• The inference
• A measure of its validity
36
3.2.3 Central Limit Theorem
Central Limit Theorem (CLT) states that for a sample size greater than 30, the sample mean is very
close to the population mean.
• When sample size is greater than 30, the sample mean approaches a normal distribution.
• In such cases, the Standard Error of Mean (SEM) that represents the variability between the
sample means is very less.
Population Standard Deviation

SEM =
Sample Size
Selecting a sample size also depends on the concept called Power of the Test.
37
3.2.3 Central Limit Theorem
The Central Limit Theorem concludes the following:

• Central Limit Theorem: The Central Limit Theorem states that the sampling distribution of the
sample means approaches a normal distribution as the sample size gets larger - no matter what
the shape of the population distribution.
• This fact holds especially true for sample sizes over 30. All this is saying is that as you take more
samples, especially large ones, your graph of the sample means will look more like a normal
distribution.
• This sampling distribution will approach normality as the sample size increases.
• CLT aids in making inferences from the sample statistics about the population
parameters irrespective of the distribution of the population.
• CLT becomes the basis for calculating the confidence interval for a hypothesis
test as it allows the use of a standard normal table.
3.3 Hypothesis Testing
39
3.3.1 General Concepts and Goals of Hypothesis Testing
The steps involved in statistical inference are:

 Define the problem objective precisely
 Decide if the problem will be evaluated by a one-tail or two-tail test
 Formulate a null hypothesis and an alternate hypothesis
 Select a test distribution and a critical value of the test statistic reflecting the degree of uncertainty that can
be tolerated (the alpha, u, risk)
 Calculate a test statistic value from the sample information
 Make an inference about the population by comparing the calculated value to the critical value. This step
determines if the null hypothesis is to be rejected. If the null is rejected, the alternate must be accepted
 Communicate the findings to interested parties
40
Statistical and Practical Significance of Hypothesis Test

The differences between a variable and its hypothesized value may be statistically significant but may not
be practical or economically meaningful.
Example: Based on the hypothesis test, Nutri Worldwide Inc. implemented a trading strategy.
The returns:
• Are economically significant when logical reasons are examined before implementation.
• May not be significant when the statistically proven strategy is implemented directly.
• May be economically insignificant due to taxes, transaction costs, and risks.
41
Examples of the Null Hypothesis and Alternate Hypothesis
1. A cement plant has found that the historical mean strength of cement is 25 units. The Company wants to
assess whether the mean strength continues to be the same.
• In the Null hypothesis, we will assume that the mean strength
• (25 units) has not changed. Therefore the null and alternate hypothesis will be written as :
• Ho: µ = 25
• H1: µ ≠ 25
• The number of tails is 2 as we want to asses whether the mean strength has changed
2. We want to evaluate whether a new incentive scheme has increased the mean daily production of the
company.
• The historical mean is µo. In the null hypothesis, we will assume that the mean production level has not
changed.
• Therefore the null and alternate hypothesis would be written as
• Ho: µ = µo
• H1: µ > µo
• The number of tails =1 (right tail) as we want to assess whether the mean production has increased.
42
Examples of the Null Hypothesis and Alternate Hypothesis
3. A company has appointed a new courier service. They wish to assess whether the package is delivered
faster than before.
• In the Null hypothesis, we will assume that the mean delivery time µo has not changed; the null and
alternate hypothesis will, therefore, be written as
• Ho: µ = µo
• H1: µ < µo
• The number of tails = 1 (Left tail) as we want to assess whether the mean service time has reduced.
43
3.3 Hypothesis Testing with Normal Data
Null Hypothesis vs. Alternate Hypothesis

The conceptual differences between a null and an alternate hypothesis are as follows:
Measure
ment
System
Variation
Null Hypothesis Alternate Hypothesis
• Represented as H0 • Represented as Ha
• Cannot be proved, only rejected • Challenges the null hypothesis
• Example: Movie is good • Example: Movie is not good
If the null hypothesis is rejected, the alternative hypothesis must be right.
44
What is Confidence Interval?

In Statistics confidence intervals are of 3 types first is 95% Confidence Interval, second is 90% CI,
and Third is 99% CI ; by default, it is always 95% CI, but you can have 90 and 99% CI also. The
statistical Term alpha is derived as α is = 1-0.95 or 1-0.99 or 1-0.90. We can calculate the
Confidence Interval using formulas given in statistics.
45
For Example :- Suppose an Estimate is needed for the average coating thickness for a population of 1000
circuit boards received from a supplier. Rather than measure the coating thickness on all 1000 boards one
might randomly pick up 36 boards for measurement. Suppose the average coating thickness of these 36
boards is 0.003, and the standard deviation of the 36 coating measurements is 0.0005. The standard deviation
is assumed known from past experience. Determine the 95% confidence interval for the true mean.
• From the Z table as the sample size is greater than 30 we use Z Table available in goggle search we know
that Zα/2 = 1.96 also we have
• α = 0.05 , X Bar = 0.003 , σ = 0.0005 , n=36
• We will use the statistical formula to calculate Confidence Interval which is given below:-
• XBar – Zα/2 * σ/ sqrt of n ≤ μ ≤ X bar + Zα/2 * σ/ Sqrt of n
• Substituting the values in the formula we obtain
• 0.003-(1.96) * 0.0005/ Sqrt of 36 ≤ μ ≤ 0.003 + 1.96 * 0.0005/ Sqrt of 36
• 0.00284 ≤ μ ≤ 0.00316
• Thus the 95% confidence interval for the mean is (0.00284,0.00316)
46
3.3.2 Significance; Practical vs. Statistical
Comparing Two Situations – Asking “Are they different?”

Ho: Null Hypothesis – There is no difference Ha: Alternate Hypothesis – There is a difference
Determine Hypothesis Hypothesis is usually

stated as “no difference”
Calculate the Test type: Depends on what you want to

P-value know
Cannot NO YES
Reject Reject
Null P value<.05 ? Null
Hypothesis Hypothesis
No statistical evidence for a difference Statistical evidence for a difference
47
3.3.2 Significance; Practical vs. Statistical
Truth
Ho Ha
Truth Table
Type II Error
Type II
Do Not Correct Error ,b You do not reject Ho
Reject Ho Decision Or Consumer risk when Ha is true
Type I Error
Type I
Error, a Correct You reject Ho
Or Decision when Ho is true
Reject Ho Producer risk
The P-value is the probability of making a Type I error. When a = 0.05 then P-value < 0.05 is our judgment criterion.
We say that the decision is made at the 95% (1-a) confidence level.
48
3.3.3 Risk; Alpha & Beta
Alpha risk is the risk of incorrectly deciding to reject the null hypothesis. If the confidence interval is 95%,
then the alpha risk is 5% or 0.05.
Alpha risk is also called False Positive and Type I Error.
Confidence Level = 1 - Alpha Risk

Alpha is called the significance level of a test. The level of significance is commonly between 1% or 10%
but can be any value depending on your desired level of confidence or need to reduce Type I error.
Selecting 5% signifies that there is a 5% chance that the observed variation is not actually the truth.
49
3.3.3 Risk; Alpha & Beta
Beta risk is the risk that the decision will be made that the part is not defective when it really is.
If the power desired is 90%, then the Beta risk is 10%.
There is a 10% chance that the decision will be made that the part is not defective when in reality it is defective.
Power = 1 - Beta risk

Beta risk is also called False Negative and Type II Error.
Power is the probability of correctly rejecting the Null Hypothesis.
The Null Hypothesis is technically never proven true. It is "failed to reject" or "rejected.“
"Failed to reject" does not mean accept the null hypothesis since it is established only to be proven false by testing
the sample of data.
50
Hypothesis Testing Possible Scenarios
• During Analyse Phase, to establish statistical significance for the estimation of mean, variance, etc. for the
population from two or multiple samples (for Y)
• Take two or more samples for the Y data from the population and conduct appropriate test(s) to draw inferences
about the population
• During Analyse Phase, to establish statistical significance for the estimation of mean, variance, etc. for the
population from one sample (for X and Y)
• Take one sample for the X and Y data from the respective populations and conduct appropriate test(s) to draw
inferences about the populations
• During Analyse phase, study or establish a correlation between X and Y
• This helps in understanding which X has a max impact on Y and therefore shortlist critical Xs
• During Improve phase, repeat the appropriate tests above to verify and confirm process improvements
51
3.3.4 Types of Hypothesis Testing
There are 2 Types of Hypothesis Testing

• Parametric Hypothesis testing
• Non Parametric Hypothesis Testing
Parametric Hypothesis Testing focusses on the Standard Deviation and the Mean of the
Sample and Non Parametric Hypothesis Testing Focusses on the Median
3.4 Hypothesis Testing with
Normal Data
53
Examples of Parametric Hypothesis Testing
• 1-Sample T Test (Mean v/s Target) this test is used to compare the mean of a process with a target value such as an ideal
goal mean to determine whether they
• 1 Sample Standard Deviation This test is used to compare the standard deviation of the process with a target value such
as a benchmark whether they differ often used to evaluate how consistent a process is
• 2 Sample T (Comparing 2 Means) Two sets of different items are measured each under a different condition there the
measurements of one sample is independent of the measurements of another sample.
Example of 2 sample T Test is two populations two samples from this test we can find the average expenditure of the
male customer if it is equal to the average expenditure of the female customer.
• Paired T The same set of items are measured under 2 different conditions; therefore, the 2 measurements of the same
item are dependent or related to each other.
• 2-Sample Standard This test is used when comparing 2 standard deviations of samples
• Standard Deviation test This Test is used when comparing more than 2 standard deviations of samples to be compared.
54
• Generally, z-tests are used when we have large sample sizes (n > 30), whereas t-tests are most helpful with a smaller
sample size (n < 30). Both methods assume a normal distribution of the data, but the z-tests are most useful when
the standard deviation is known.
• A T test is usually done to compare the means of two treatments for instance if we want to compare to compare the
performance of a machine before some adjustments are performed on it and the performance after the adjustments
are performed , the mean of one sample of products taken prior to adjustments can be compared to the mean of
another sample taken after adjustment. In that case, a t-test can be useful.
55
• The hypothesis testing performed based on t-test is conducted using the degree of freedom and the confidence
level, but when two sample means are being compared, there is always a room for making an error. If alpha = 0.05
there would be a 5% chance of rejecting a null hypothesis that happens to be true. If for instance, three sample
means A,B,C are being compared using the t-test with a confidence interval of 95% two factors are compared at a
time.
• A is compared with B, then A with C and then b with C. Every time two factors are being compared there are 0.05
probabilities for rejecting a true null hypothesis . Therefore when are three factors are compared using the t-test
the type of making Type I error is inflated. In order to limit the chances of making a Type I Error inflation , we can
use analysis of variance (ANOVA).
• ANOVA is a hypothesis test when more than two factor means are being compared.
56
3.4.1 1 & 2 Sample t-tests
1-Sample t-test
• Use 1-Sample t to estimate the mean of a population and to compare it to a target value or a reference
value when you do not know the standard deviation of the population. Using this analysis, you can do the
following: Determine whether the population mean differs from the hypothesized mean that you specify.
• Calculate a range of values that is likely to include the population mean.
• For example, a quality analyst uses a 1-sample t-test to determine whether the average thread length of
bolts differs from the target of 20 mm. If the mean differs from the target, the analyst uses the confidence
interval to determine how large the difference is likely to be and whether that difference has practical
significance.
• Where to find this analysis
• To perform a 1-sample t-test, choose Stat > Basic Statistics > 1-Sample t.
57
Application of 1 Sample t-test

An economist wants to determine whether the monthly energy cost for families had changed from the
previous year when the mean cost per month was $200. The economist randomly samples 25 families and
records their energy costs for the current year.
The economist performs a 1-sample t-test to determine whether the monthly energy cost differs from $200.
• Open the sample data, Family Energy Cost.MTW.
• Choose Stat > Basic Statistics > 1-Sample t.
• From the drop-down list, select One or more samples, each in a column and enter Energy Cost.
• Select Perform hypothesis test.
• In the Hypothesized mean, enter 200.
• Click OK.
58
1-Sample t Minitab Output

• The null hypothesis states that the mean of the energy costs is $200.
Because the p-value is 0.000, which is less than the significance level of
0.05, the economist rejects the null hypothesis and concludes that the
average monthly energy cost for families differs from $200. The 95% CI
indicates that the population mean is likely to be greater than $200.
59
Comparison of Means of Two Processes

• Means of two processes are compared to:
• Understand the significant difference in the outcome of the two processes;
• Understand whether a new process is better than an old process;
• Understand whether the two samples belong to the same population or a different population; and
• Benchmark the existing process with another process.
60
2t - test
The average heights of men in two different sets of people are compared to see if the means are significantly different.
For this test, the sample sizes, means and variances are required to calculate the value of t. Two samples of sizes n1 of
125 and n2 of 110 are taken from the two populations. The mean value of sample size 1 is 167.3 and sample size 2 is
165.8. The standard deviation for sample sizes 1 and 2 are 4.2 and 5.0 respectively.
61
Paired Comparison Hypothesis Test for Means (Theoretical)

The two-mean t-test with unequal variances is:
• H0: μ1 = μ2 against Ha: μ1≠μ2
• Two samples of sizes n1 = 125 and n2 = 110 are taken from the two populations
• X1 = 167.3, X2 = 165.8, s1 = 4.2, s2 = 5.0 are the sample means and SDs respectively
• Compute test statistic
• Reject H0 at the level of significance α if |Computed t|> tDF,α/2

• Since t223, 0.025 = 1.96, the null hypothesis is rejected at 5% level of significance
62
3.4.2 1 Sample Variance
Hypothesis Test for 2 Variance test – Example

Susan is trying to compare the standard deviation of two companies. According to her, the earnings of Company A are
more volatile than those of Company B. She has been obtaining earnings data for the past 31 years for Company A, and
for the past 41 years for Company B. She finds that the sample standard deviation of Company A’s earnings is $4.40 and
of Company B’s earnings is $3.90. Determine whether the earnings of Company A have a greater standard deviation than
those of Company B at 5% level of significance.
63
3.4.2 1 Sample Variance
Hypothesis Test for Equality of Variance – F-test Example

The degrees of freedom for company A and company B are:
• dfA (degrees of freedom of A) = 31 – 1 = 30
• dfB (degrees of freedom of B) = 41 – 1 = 40
The critical value from F-table equals 1.74. The null hypothesis is rejected if the F-test statistic is greater than 1.74.
Calculation of F-test statistic: F= (SA2/S 2) = 4.402/3.902 = 1.273
Results: The F-test statistic (1.273) is not greater than the critical value (1.74). Therefore, at 5% significance level,
the null hypothesis cannot be rejected.
64
3.4.3 One Way Anova
• A chemical engineer wants to compare the hardness of four blends of paint. Six samples of each paint blend
were applied to a piece of metal. The pieces of metal were cured. Then each sample was measured for hardness.
In order to test for the equality of means and to assess the differences between pairs of means, the analyst uses
one-way ANOVA with multiple comparisons.
• Open the sample data, Paint Hardness. MTW.
• Choose Stat > ANOVA > One-Way.
• Select Response data are in one column for all factor levels.
• In Response, enter Hardness.
• In Factor, enter Paint.
• Click the Comparisons button, then select Tukey
• Click OK in each dialog box.
65
3.4.3 One Way Anova
One Way Anova Minitab Output
66
3.4.3 One Way Anova
ANOVA- Test for equal variances
67
3.4.3 One Way Anova
68
3.4.3 One Way Anova
Test for Equal Variances: Hardness vs Paint

Bartlett’s Test
Blend 1 P-Value 0.441
Blend 2
Paint
Blend 3
Blend 4
0 5 10 15 20
95% Bonferroni Confidence Intervals for StDevs
69
3.4.3 One Way Anova
70
3.4.3 One Way Anova
One Way Anova Interpretation

The p-value for the paint hardness ANOVA is less than 0.05. This result indicates that the hardness of the paint
blends differs significantly. The engineer knows that some of the group means are different.
3.5 Hypothesis Testing with
Non-Normal Data
72
3.5 Hypothesis Testing with Non-Normal Data
Non-Parametric Hypothesis Test
• Non Parametric tests are used when data are Not Normal examples of Non parametric tests
which focusses on the median are given below
• Mann-Whitney
• Kruskal Wallis
• Moods Median
• Friedman
• 1 Sample Sign
• 1 Sample Wilcoxon
• One and Two Sample Proportion
• Chi Square tests
73
3.5.1 Mann-Whitney Test
Mann-Whitney Test Example

• A state highway department uses two brands of paint for painting stripes on roads. A highway official
wants to know whether the durability of the two brands of paint are different. For each paint, the
official records the number of months the paint persists on the highway.
• The official performs a Mann-Whitney test to determine whether the median number of months that
the paint persists differs between the two brands.
• Open the sample data, Highway Paint.MTW.
• Choose Stat > Non Parametrics > Mann-Whitney.
• In First Sample enter Brand A.
• In Second Sample, enter Brand B.
• Click OK.
74
3.5.1 Mann-Whitney Test
Interpretation using P Values of Mann-Whitney Test

The null hypothesis states that the difference in the median
number of months that the paint persists between the two
brands is 0. Because the p-value is 0.0019, which is less than the
significance level of 0.05, the official rejects the null hypothesis.
The official concludes that the difference in the median number
of months the paints persists between the two brands is not 0.
The 95.5 Percent CI indicates that the population median of
Brand B is likely to be greater than Brand A.
75
3.5.2 Kruskal-Wallis Test
The Kruskal-Wallis test is also a non-parametric test used for testing the source of origin of the samples.
Characteristics of the Kruskal-Wallis test are as follows:
• The only way to analyze the variance by ranks.
• Medians of two or more samples are compared to find the source of origin of the sample.
• Unlike the analogous one-way analysis of variance, it does not assume the normal distribution of the residuals.
• The Null hypothesis is when medians of all the groups are equal, and
• The Alternative hypothesis is when at least one population median of one group is different than
that of at least one other group.
76
Example of Kruskal-Wallis Test and Mood’s Median Test
A health administrator wants to compare the number of unoccupied beds for three hospitals in the same city. The administrator
randomly selects 11 different days from the records of each hospital and enters the number of unoccupied beds for each day.
To determine whether the median number of unoccupied beds differs, the administrator uses the Kruskal-Wallis test.
1. Open the sample data, HospitalBeds.MTW.

2. Choose Stat > Nonparametrics > Kruskal-Wallis.
3. In Response, enter Beds.
4. In Factor, enter Hospital.
5. Click OK.
77

The sample medians for the three hospitals are 16.00, 31.00, and 17.00. The average ranks show that hospital 2 differs the
most from the average rank for all observations and that this hospital is higher than the overall median.
Both p-values are less than 0.05. The p-values indicate that the median number of unoccupied beds differs for at least one
hospital.
78
Kruskal-Wallis Test: Beds versus Hospital

Descriptive Statistics
Hospital N Median Mean Rank Z-Value
1 11 16 14.0 -1.28
2 11 31 23.3 2.65
3 11 17 13.7 -1.37
Overall 33 17.0
79
Kruskal-Wallis Test: Beds versus Hospital

Test
Null hypothesis H₀: All medians are equal

Alternative hypothesis H₁: At least one median is different
Method DF H-Value P-Value
Not adjusted for ties 2 7.05 0.029
Adjusted for ties 2 7.05 0.029
80
Mood Median Test: Beds versus Hospital
Mood median test for Beds

Chi-Square = 7.52 DF = 2 P = 0.023
Individual 95.0% CIs

Hospital N≤ N> Median Q3-Q1 -----+---------+---------+---------+-
1 7 4 16.0 23.0 (---------*------------)
2 2 9 31.0 12.0 (----*-------)
3 8 3 17.0 24.0 (-----------*-----------)
-----+---------+---------+---------+-
10 20 30 40
Overall median = 24.0
81
3.5.3 Mood’s Median Test
The Mood’s median is a non-parametric test that is used to test the equality of medians from two or
more different populations. This test works when:
• The output (Y) variable is continuous, discrete-ordinal or discrete-count, and
• The input (X) variable is discrete with two or more attributes.
The steps involved in Mood’s Median test are as follows:
Find the median of

the combined
data set Find the number
of values in each
sample > median Form a
contingency
table Find expected
value for each
cell Find chi-square
value
82
3.5.4 Friedman Test
Friedman test is a form of non-parametric test that does not make any assumptions on the shape and
origin of the sample.
• It allows smaller sample data sets to be analysed, and
• Unlike ANOVA, it does not require the dataset to be randomly sampled from normally distributed
populations with equal variances.
Note: The test uses the null hypothesis where the population medians of each treatment are statistically
identical to the rest of the group.
83
3.5.5 1 Sample Sign Test
The 1 Sample Sign test is the simplest of all the non-parametric tests that can be used instead of a
one sample t test.
• Here, H0 is the hypothecated median or assumed median of the sample, which belongs to the
Population.
Steps involved in 1 Sample Sign test are as follows:
Count the number of positive Count the number of

Test the values
values negative values
Values that are larger than Values that are smaller than Check if there are significantly
hypothesized median the hypothesized median more positives (or negatives)
than expected
84
3.5.6 1 Sample Wilcoxon Test
The 1 Sample Wilcoxon test also known as the Wilcoxon Signed Rank test is a non-parametric test.
This test is:
• Equivalent to parametric One Sample t-Test, and
• Powerful than non-parametric 1 Sample Sign Test.
85
Characteristics of 1 Sample Wilcoxon Test

Some characteristics of this test are as follows:
• It assumes the existing sample is randomly taken from a population, with a symmetric
frequency distribution around the median, and
• The symmetry can be observed with a histogram, or by checking if the median and mean are
approximately equal.
The conclusion in this test is that if the value is on the mid-point, you can continue
and accept the null hypothesis. If not, reject the alternate hypothesis.
86
1 Sample Wilcoxon Test - Example

An example of Sample Wilcoxon test is shown.
The Median customer satisfaction score of an organization has always been 3.7 and the management wants to
see if this has changed. They conducted a survey and got the results grouped by the customer type.
Conclusion:
• If median = 3.7 = Accept Ho
• If median ≠ 3.7 = Reject Ho
• α = 0.05
87
3.5.7 One and Two Sample Proportion Test
One and Two Sample Proportion

1. Proportion Test: Analyze difference in a sample proportion and target
2. Proportion Test: Analyze difference in two sample, independent, proportions
88
One and Two Sample Proportion
89
Example of Hypothesis Test-1 Proportion

A marketing analyst wants to determine whether mailed advertisements for a new product result in a response
rate different from the national average. A random sample of 1000 households is chosen to receive
advertisements. Of the 1000 households sampled, 87 make a purchase after receiving the advertisement.
The analyst performs a 1 proportion test to determine whether the proportion of households that made a
purchase is different from the national average of 6.5%.
1. Choose Stat > Basic Statistics > 1 Proportion.
2. From the drop-down list, select Summarized data.
3. In Number of events, enter 87.
4. In Number of trials, enter 1000.
5. Select Perform hypothesis test.
6. In Hypothesized proportion, enter 0.065.
7. Click OK.
90
Interpretation of 1 Sample Proportion Test

• The null hypothesis states that the proportion of households
that make a purchase equals 0.065. Because the p-value is
0.008, which is less than the significance level of 0.05, the
analyst rejects the null hypothesis. The results indicate that
the proportion of households that make a purchase is
different from the national average of 6.5%.
91
3.5.8 Chi-Square Distribution
The Chi-square distribution (χ²-distribution) or Chi-squared:

• Is a widely used probability distribution in inferential statistics;
• Needs one sample for the test to be conducted; and
• With k-1 degrees of freedom is the distribution of a sum of the squares of k independent standard
normal random variables.
𝒳 2 f0 −fe 2
Calculated = Σ
fe
Where,
• 𝒳2 = chi-square index
Calculated
• Fo = An observed frequency
• Fe = An expected frequency
92
Chi-Square Test - Example

To analyze the Australian hockey team’s wins, the data has
two classifications:
• The table is called a 2 X 4 contingency table.
• Expected frequency for each of the observed
frequencies = (row total)(column total)/overall total.
Estimated Population
Sample Statistics
Parameters
Example: Observed frequency of 3 wins against South
Africa in Australia would convert to the expected 92
frequency of (21 / 31) * 5 = 3.39
Australian hockey team wishes to analyze its wins at

home and abroad against four different countries.
93
The table is populated by:

• Calculating and adding the estimated population parameters;
• Estimating the observed frequency; and
• Calculating the final chi-square index.
94
H0: Proportion of wins in Australia or abroad is independent of the country played against
Ha: Proportion of wins in Australia or abroad is dependent on the country played against
χ2 Critical = 6.251 and
χ2 Calculated = 1.36
Result: Since calculated value is less than the critical value, the proportion of wins of Australia
hockey team is independent of the country played or place.
95
Chi-Square Test – Example: Interpretation of Results

There is a different chi-square distribution for each different number of degrees of freedom.
For chi- square distribution, degrees of freedom are calculated as per the number of rows and
columns in the contingency table.
The purpose of the Chi-square test is to test the hypothesis
H0 = The data follow a specified distribution
HA = The data do not follow a specified distribution
96
To conduct the Chi-square test the steps are given below:

1. State the null and alternative hypothesis (H0, HA).
2. Arrange a random sample of size n into a frequency histogram of k class intervals.
3. Determine Oi = observed frequency in the ith class interval.
4. Determine Ei = expected frequency in the ith class interval using the hypothesis distribution.
5. State the α value. k
(Oi – Ei)2
6. Compute the test statistic X20 = ⅀
i=1 Ei
7. Compute the critical value. Reject H0 if X20 > X2a,k-p-1. The value of p is the number of
parameters estimated.
8. State the conclusion of the test.
97
Chi-Square Test for Association
98
99

Six Sigma Green Belt 3.ANALYSE (IASSC)

Uploaded by

Copyright:

Available Formats

Six Sigma Green Belt 3.ANALYSE (IASSC)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Six Sigma Green Belt 3.ANALYSE (IASSC)

Uploaded by

Copyright:

Available Formats

What is the purpose of conducting a VA/NVA analysis?

What is the purpose of conducting a VA/NVA analysis?

What are the different types of activities classified in a VA/NVA analysis?

What are the different types of activities classified in a VA/NVA analysis?

Six Sigma

Subassembly Final Assembly Test Stage Ship

What steps can I modify to deliver an improved process to my customers

Eliminate Stop doing the process step entirely

Combine Flow by eliminating the wait/inventory between 2 steps

Control inventory between steps at a fixed level and only

Separate from the critical path Perform steps in parallel

Improve the performance/predictability of a highly variable

Positional Cyclical Temporal

Continued on next slide

Example of Multi-Vari Chart

Multi-Vari Chart Example

Create Multi-Vari Chart

Select Process and Decide Sample Create a Link the

Example: Example: Example: Example: Example:

The path to create a Multi-Vari chart in Minitab is:

Probability Statistics Inferential Statistics

Discrete Distribution Continuous Distribution

• Binomial distribution • Normal distribution

Best suitable when the sample size is

p = probability of success; r = number of successes desired; n = sample size

Sample factorial calculation 5! = 5 ∗ 4 ∗ 3 ∗ 2 ∗ 1 = 120

Calculating Binomial Distribution - Example

Describes the discrete data

Used to analyze situations wherein the

Used for predicting the number of defects

Poisson Distribution - Formula

Mean of a Poisson Distribution (µ) = λ

Calculating Poisson Distribution – Example

Continuous Probability Distribution

Continued on next slide

In a normal distribution, to standardize comparisons of dispersion, a standard Z variable is

The t-distribution approaches normality as the sample size increases.

where S1 and S2 = standard deviations of the two samples

Descriptive Statistics Inferential Statistics

• Also known as Enumerative statistics • Also known as Analytical statistics

Statistical inference in a practical situation contains two elements:

Population Standard Deviation

The Central Limit Theorem concludes the following:

The steps involved in statistical inference are:

Statistical and Practical Significance of Hypothesis Test

Null Hypothesis vs. Alternate Hypothesis

If the null hypothesis is rejected, the alternative hypothesis must be right.

What is Confidence Interval?

Comparing Two Situations – Asking “Are they different?”

Determine Hypothesis Hypothesis is usually

Calculate the Test type: Depends on what you want to

No statistical evidence for a difference Statistical evidence for a difference

Confidence Level = 1 - Alpha Risk

If the power desired is 90%, then the Beta risk is 10%.

Power = 1 - Beta risk

Power is the probability of correctly rejecting the Null Hypothesis.

• During Analyse phase, study or establish a correlation between X and Y

There are 2 Types of Hypothesis Testing

Application of 1 Sample t-test