Six Sigma Green Belt 3.ANALYSE (IASSC)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 99
At a glance
Powered by AI
The key takeaways are about conducting various Lean analysis tools like VA/NVA analysis and understanding statistical concepts like Chi-Square tests.

The purpose of conducting a VA/NVA analysis is to identify and eliminate hidden costs that do not add value to the customer by reducing unnecessary process complexity and errors.

The different types of activities classified in a VA/NVA analysis are value-added, business non-value added, and non-value added activities.

Six Sigma

www.invenislearning.com
3.0 Analyze Phase

www.invenislearning.com
3.1 Patterns of Variation

www.invenislearning.com
4
Lean Tools - Value Add (VA) and Non-Value Add
(NVA) Analysis
• The objective of the VA/NVA analysis is to:
• Identify and eliminate the hidden costs that do not add value to the customer
• Reduce unnecessary process complexity, and thus errors

• Method:
• Classify each process step as value-added (also known as "customer value-add"), business non-value-add
(sometimes called "required waste"), and non-value-add
• Add up the time spent in each category
• Decide what to do next.
• Value-add tasks should be optimized and standardized
• Business non-value-add tasks should be checked with the customer and, where possible, minimized or
eliminated
• Non-value-add activities should be eliminated

www.invensislearning.com
5
Lean Tools - Value Add (VA) and Non-Value Add
(NVA) Analysis
• Value-Added or Customer Value –Added
• Must be performed to meet customer needs
• Adds form or feature to the service
• Enhances service quality, enables on-time or more competitive delivery or has a positive impact on
price competition
• Customers would be willing to pay for this work if they knew you were doing it

• Non-Value-Added
• Rework, Duplicating, waiting, etc.

• Business Value-Added
• Internal Requirements i.e. compliance

www.invensislearning.com
6
Lean Tools - Value Add (VA) and Non-Value Add
(NVA) Analysis
• Lead Time
• The time between order and delivery
• Cycle Time C/T
• The time taken at each step to create a
product/service element
• Takt Time
• Customer demand rate
• Process Time P/T
• The time taken to produce one item when
one operator is working on a product at a
time – it equals C/T (in case of batch
processing C/T = (P/T) / no. of items
produced

www.invensislearning.com
7
Lean Tools - Value Add (VA) and Non-Value Add
(NVA) Analysis
Value stream pinpoints value add and non value add activities

Production Sales

Suppliers Customer
Forecasts Demand

Forecasts Forecasts

Subassembly Final Assembly Test Stage Ship

I I I I I
Components
4 weeks
4 days 3 days 5 days 10 days 42 days
92 minutes
20 min 42 min 10 min 15 min 5 min

www.invensislearning.com
8
Lean Tools - Value Add (VA) and Non-Value Add
(NVA) Analysis
High
variation
Production Sales

Suppliers Customer
Forecasts Demand

High
defect
rate
Forecasts Forecasts

Excessive
inventory
Subassembly Final Assembly Test Stage Ship

I I I I I
Components Long set
4 weeks
up times 42 days
4 days 3 days 5 days 10 days
92 minutes
120 min 42 min 10 min 15 min 5 min

www.invensislearning.com
9
Lean Tools - Value Add (VA) and Non-Value Add
(NVA) Analysis

What steps can I modify to deliver an improved process to my customers

Eliminate Stop doing the process step entirely

Combine Flow by eliminating the wait/inventory between 2 steps

Control inventory between steps at a fixed level and only


Pull produce to that level

Separate from the critical path Perform steps in parallel

Improve the performance/predictability of a highly variable


Mitigate the impact (inventory/error rate etc.) process

www.invensislearning.com
10
Takt Time
• Takt Time Calculation
• The takt time is the amount of available work time divided by the customer demand during that time period
• Example:
• Work Schedule: 8 hours/day = Total of 480 minutes in a day
• No. of shipments to handle in a day = 150
• Takt time = 480 (minutes)/150 = One shipment for every 3.2 minutes
• Any VA step in a process map that takes longer than the Takt rate is considered a time trap
• Divide the total time for the process by Takt time to get a rough estimate the staff requires to operate the process

www.invensislearning.com
11
3.1 Patterns of Variation
3.1.1 Multi-Vari Analysis

Multi-Vari Studies
Multi-Vari studies analyze variation, investigate process stability, identify investigation areas, and
break down the variation.
They classify variation sources into three major types:

Positional Cyclical Temporal


Variations within a single unit Variations among sequential Variations which occur over
where variation is due to repetitions over a short time. longer periods of time.
location.
Examples: Every n’th pallet Examples: Process drift,
Examples: Pallet stacking in a broken, batch-to-batch performance before and after
truck, temperature gradient in variation, lot-to-lot variation, breaks, seasonal and shift based
an oven, the variation observed invoices received day-to-day and differences, month-to-month
from cavity-to-cavity within a account activity week-to-week closings, and quarterly returns
mold, a region of a country, the
line on the invoice

Continued on next slide


www.invensislearning.com
12
3.1 Patterns of Variation
3.1.1 Multi-Vari Analysis

• Use Multi-Vari Chart as a preliminary tool to investigate variation in your data, including cyclical
variations and interactions between factors.
• A multi-vari chart provides a graphical representation of the relationships between factors and a
response.
• The multi-vari chart displays the means at each factor level for every factor. In Minitab, each multi-vari
chart can display up to four factors.
• For Example, a manufacturer produces plastic pipes using two different machines with three
temperature settings. The quality engineer is concerned about the consistency of pipe diameters from
the different machines and settings. The engineer creates a multi-vari chart to investigate the variation in
pipe diameters.

www.invensislearning.com
13
3.1 Patterns of Variation
3.1.1 Multi-Vari Analysis

Example of Multi-Vari Chart


An engineer wants to assess the effect of sintering time on the compressive strength of three different metals. The
engineer measures the compressive strength of five specimens of each metal type at each sintering time: 100 minutes,
150 minutes, and 200 minutes.
• The engineer creates a multi-vari chart to look for possible trends and interactions in the data.
• Open the sample data, SinteringTime.MTW.
• Choose Stat > Quality Tools > Multi-Vari Chart.
• In Response, enter Strength.
• In Factor 1, enter SinterTime.
• In Factor 2, enter MetalType.
• Click OK.
Interpret the results
• The multi-vari chart indicates a possible interaction between the type of metal and the length of sintering time. The
greatest compressive strength for Metal Type 1 is obtained by sintering for 100 minutes, for Metal Type 2 by sintering
for 150 minutes, and for Metal Type 3 by sintering for 200 minutes.

www.invensislearning.com
14
3.1 Patterns of Variation
3.1.1 Multi-Vari Analysis

Multi-Vari Chart Example


The data shows that the strength varies differently across sintering times for different metal types, indicating
an interaction.

www.invensislearning.com
15
3.1 Patterns of Variation
3.1.1 Multi-Vari Analysis

Create Multi-Vari Chart


The five steps to create a Multi-Vari chart are:

Select Process and Decide Sample Create a Link the


Plot the Chart
Characteristics Size Tabulation Sheet Observed Values

Example: Example: Example: Example: Example:


Select the process Sample size is five pieces The tabulation sheet Chart is plotted The observed
where the plate is from each equipment with data records with time on the X values are linked
being manufactured and the frequency of contains the columns axis and the plate by appropriate
and measure its data collection is every with time, equipment thickness on the Y lines.
thickness within a two hours. number, and axis.
specified range. thickness as headers.

www.invensislearning.com
16
3.1 Patterns of Variation
3.1.1 Multi-Vari Analysis

The path to create a Multi-Vari chart in Minitab is:


Minitab > Stat > Quality Tools > Multi-Vari Chart

www.invensislearning.com
17
3.1 Patterns of Variation
3.1.2 Classes of Distributions

The data obtained from the measurement phase exhibits a variety of distribution, depending on the data type
and its source.
The methods used to describe the parameters for classes of distribution are:

Probability Statistics Inferential Statistics


• It is based on an assumed • Uses the measured data to • Describe the population
model of distribution. determine a model to parameters based on the
• Used to find the chances of describe the data used. sample data using a particular
certain outcome/event to model.
occur.

www.invensislearning.com
18
3.1 Patterns of Variation
3.1.2 Classes of Distributions

Types of Distributions
The two types of distribution are as follows:

Discrete Distribution Continuous Distribution

• Binomial distribution • Normal distribution


• Poisson distribution • Chi-square distribution
• t-distribution
• F-distribution

www.invensislearning.com
19
3.1 Patterns of Variation
3.1.2 Classes of Distributions

Binomial Distribution
The binomial distribution is a probability distribution for discrete data.

Characteristics of
Binomial Distribution
Predicts sample behavior Describes the discrete data as a
result of a particular process

Best suitable when the sample size is


Used to deal with defective items less than thirty and less than ten
percent of the population

P(R) = n Cr ∗ pr ∗ 1 − pnr ( ) -
where, P(R) = probability of exactly (r) successes out of a sample size of (n)

p = probability of success; r = number of successes desired; n = sample size

www.invensislearning.com
20
3.1 Patterns of Variation
3.1.2 Classes of Distributions

Binomial Distribution
Some of the key calculations of binomial distribution are shown.

Term Formula
𝜇 = 𝑛𝑝
Mean where, n = sample size
p = probability of success
𝜎 = 𝑛𝑝(1 − 𝑝)
Standard Deviation where, n = sample size
p = probability of success

Sample factorial calculation 5! = 5 ∗ 4 ∗ 3 ∗ 2 ∗ 1 = 120


4! = 4 ∗ 3 ∗ 2 ∗ 1 = 24

www.invensislearning.com
21
3.1 Patterns of Variation
3.1.2 Classes of Distributions

Calculating Binomial Distribution - Example


A sample of size six is randomly selected from a batch of 14.28% nonconforming. Find the probability that the
sample has exactly two nonconforming units.
From the problem statement we know that
n=6
x=2
p = 0.1428
Filling in the numbers, we have
P(X=2) = 6!
(0.1428)2(0.8572)6-2
2!(6-2)!
= 720
(0.02014)(0.5399)
(2)(24)
= (15)(0.0204)(0.5399)
= 0.1651
Thus, the probability that the sample contains exactly two nonconforming units is 0.1651.

www.invensislearning.com
22
3.1 Patterns of Variation
3.1.2 Classes of Distributions

Poisson Distribution
Poisson distribution is an application of the population knowledge to predict the sample behaviour.

Describes the discrete data

Used to analyze situations wherein the


number of trials is large

Characteristics of
Deals with integers which can take any value
Poisson Distribution
Used where the probability of success in
each trial is very small

Used for predicting the number of defects

www.invensislearning.com
23
3.1 Patterns of Variation
3.1.2 Classes of Distributions

Poisson Distribution - Formula


The formula for the Poisson distribution is as follows:

λx ∗ e−λ
P (∗) =
∗!
where, P(x) = probability of exactly (∗) occurrences in a Poisson distribution (n)
λ = mean number of occurrences during interval
∗ = number of occurrences desired
e = base of the natural logarithm (equals 2.71828)

Mean of a Poisson Distribution (µ) = λ


Standard Deviation of a Poisson Distribution (σ) = λ

www.invensislearning.com
24
3.1 Patterns of Variation
3.1.2 Classes of Distributions

Calculating Poisson Distribution – Example


The number of defects per shift has a Poisson distribution with λ = 4.2. Find the probability that
the second shift produces fewer than two defects. Therefore, we will seek
P(x<2) = P(x=0) + P(x=1)
Filling in the numbers, we have
-4.2 0
P(X=0) = e 4.2 = 0.015
0!
e-4.2 4.21
P(X=1) = = 0.063
1!

P(X<2) = 0.078

www.invensislearning.com
25
3.1 Patterns of Variation
3.1.2 Classes of Distributions

Continuous Probability Distribution


The continuous probability distribution is characterized by the probability density function.
• A variable is said to be continuous if the range of possible values falls along a continuum.
Example: Loudness of cheering at a ball game, the weight of cookies in a package, length
of a pen, or the time required to assemble a car.
• These distributions help in predicting the sample behaviour observed in a population.

www.invensislearning.com
26
3.1 Patterns of Variation
3.1.2 Classes of Distributions

Normal Distribution
The Normal or Gaussian distribution is a continuous probability
distribution, illustrated as N (µ, σ).
• It has a higher frequency of values around the mean and
fewer occurrences away from it.
• It is used as a first approximation to describe real-valued
random variables that tend to cluster around a single mean
value.
• It is a bell-shaped curve and is symmetrical. Normal Distribution with Mean = 100 and Standard Deviation = 10

• The total area under the normal curve p(x which is found in
the distribution) = 1.

Continued on next slide

www.invensislearning.com
27
3.1 Patterns of Variation
3.1.2 Classes of Distributions

In a normal distribution, to standardize comparisons of dispersion, a standard Z variable is


utilized. The uses of Z value are as follows:
• It is unique for each probability within the normal distribution.
• It helps in finding probabilities of data points anywhere within the distribution.
• It is dimensionless with no units like mm, litres, coulombs, etc.

Z =
(Y − µ)
σ
where Z = number of standard deviations between Y and the µ
Y = value of the data point in concern
µ = mean of the population
σ = standard deviation of the population

www.invensislearning.com
28
3.1 Patterns of Variation
3.1.2 Classes of Distributions

Q Suppose the time taken to resolve customer problems follows a normal distribution with the mean
value of 250 hours and standard deviation value of 23 hrs. What is the probability that a problem
resolution will take more than 300 hrs?

A Given:
● Y = 300
● µ = 250
● σ = 23
(300−250)
Using the formula: Z = =2.17
23

● From a Normal Distribution Table, the Z value of 2.17 covers an area of 0.98499 under itself
● Thus, the probability that a problem can be resolved in less than 300 hrs is 98.5%
● The chances of a problem resolution taking more than 300 hours is 1.5%

www.invensislearning.com
29
3.1 Patterns of Variation
3.1.2 Classes of Distributions

Q Suppose the time taken to resolve customer problems follows a normal distribution with the mean
value of 250 hours and standard deviation value of 23 hrs. What is the probability that a problem
resolution will take more than 300 hrs?

A Given:
● Y = 300
● µ = 250
● σ = 23
(300−250)
Using the formula: Z = =2.17
23

● From a Normal Distribution Table, the Z value of 2.17 covers an area of 0.98499 under itself
● Thus, the probability that a problem can be resolved in less than 300 hrs is 98.5%
● The chances of a problem resolution taking more than 300 hours is 1.5%

www.invensislearning.com
30
3.1 Patterns of Variation
3.1.2 Classes of Distributions

Chi-Square Distribution
If we obtain a random sample X1, X2, …., Xn of size n from a population that is normally distributed with
mean µ a with finite variance σ2, the random variable

(n-1)s2
x2 =
σ2
is distributed as a chi-square distribution with n-1 degrees of freedom where s2 is the sample variance.
The formula for the x2 will be useful later when we discuss hypothesis testing and confidence intervals.

www.invensislearning.com
31
3.1 Patterns of Variation
3.1.2 Classes of Distributions

T-Distribution
A t-distribution is most appropriate to be used when:
• The sample size <30;
• Population standard deviation is not known; and
• Population is approximately normal.

The t-distribution approaches normality as the sample size increases.

www.invensislearning.com
32
3.1 Patterns of Variation
3.1.2 Classes of Distributions

F-Distribution
The F-distribution is a ratio of two Chi-square distributions, and a specific F-distribution is denoted by the
degrees of freedom for the numerator Chi-square and the degrees of freedom for the denominator Chi-square.

S1
Fcalculated =
S2

where S1 and S2 = standard deviations of the two samples


● If Fcalculated is 1, there is no difference in the variance
● If S1> S2, then the numerator should be greater than the denominator (df1 = n1 – 1 and df2 = n2 – 1)

Refer F-table to find out critical F-distribution at α and degrees of freedom of samples of two
different processes (df1 and df2)

www.invensislearning.com
3.2 Inferential Statistics

www.invenislearning.com
34
3.2 Inferential Statistics
3.2.1 Understanding Inference

Types of Statistics
Statistics refers to the science of collection, analysis, interpretation, and presentation of data. There are two
major types of statistics-Descriptive statistics and Inferential statistics.

Descriptive Statistics Inferential Statistics

• Also known as Enumerative statistics • Also known as Analytical statistics


• Includes organizing, summarizing, and • Includes predicting and drawing
presenting the data conclusions
• Describes what's going on in the data • Makes inferences from our data to
• Histograms, pie charts, box plots, etc., more general conditions
are the tools • Hypothesis testing, scattered diagram,
etc., are the tools

www.invensislearning.com
35
3.2 Inferential Statistics
3.2.1 Understanding Inference

Inferential statistics is a set of methods used to draw conclusions or inferences about characteristics of
populations based on data from a sample. The mean calculated for a population. The standard deviation
calculated for a population. The objective of statistical inference is to draw conclusions about population
characteristics based on the information contained in a sample

Statistical inference in a practical situation contains two elements:


• The inference
• A measure of its validity

www.invensislearning.com
36
3.2 Inferential Statistics
3.2.3 Central Limit Theorem

Central Limit Theorem (CLT) states that for a sample size greater than 30, the sample mean is very
close to the population mean.
• When sample size is greater than 30, the sample mean approaches a normal distribution.
• In such cases, the Standard Error of Mean (SEM) that represents the variability between the
sample means is very less.

Population Standard Deviation


SEM =
Sample Size

Selecting a sample size also depends on the concept called Power of the Test.

www.invensislearning.com
37
3.2 Inferential Statistics
3.2.3 Central Limit Theorem

The Central Limit Theorem concludes the following:


• Central Limit Theorem: The Central Limit Theorem states that the sampling distribution of the
sample means approaches a normal distribution as the sample size gets larger - no matter what
the shape of the population distribution.
• This fact holds especially true for sample sizes over 30. All this is saying is that as you take more
samples, especially large ones, your graph of the sample means will look more like a normal
distribution.
• This sampling distribution will approach normality as the sample size increases.

• CLT aids in making inferences from the sample statistics about the population
parameters irrespective of the distribution of the population.
• CLT becomes the basis for calculating the confidence interval for a hypothesis
test as it allows the use of a standard normal table.

www.invensislearning.com
3.3 Hypothesis Testing

www.invenislearning.com
39
3.3 Hypothesis Testing
3.3.1 General Concepts and Goals of Hypothesis Testing

The steps involved in statistical inference are:


 Define the problem objective precisely
 Decide if the problem will be evaluated by a one-tail or two-tail test
 Formulate a null hypothesis and an alternate hypothesis
 Select a test distribution and a critical value of the test statistic reflecting the degree of uncertainty that can
be tolerated (the alpha, u, risk)
 Calculate a test statistic value from the sample information
 Make an inference about the population by comparing the calculated value to the critical value. This step
determines if the null hypothesis is to be rejected. If the null is rejected, the alternate must be accepted
 Communicate the findings to interested parties

www.invensislearning.com
40
3.3 Hypothesis Testing
3.3.1 General Concepts and Goals of Hypothesis Testing

Statistical and Practical Significance of Hypothesis Test


The differences between a variable and its hypothesized value may be statistically significant but may not
be practical or economically meaningful.

Example: Based on the hypothesis test, Nutri Worldwide Inc. implemented a trading strategy.
The returns:
• Are economically significant when logical reasons are examined before implementation.
• May not be significant when the statistically proven strategy is implemented directly.
• May be economically insignificant due to taxes, transaction costs, and risks.

www.invensislearning.com
41
3.3 Hypothesis Testing
Examples of the Null Hypothesis and Alternate Hypothesis

1. A cement plant has found that the historical mean strength of cement is 25 units. The Company wants to
assess whether the mean strength continues to be the same.
• In the Null hypothesis, we will assume that the mean strength
• (25 units) has not changed. Therefore the null and alternate hypothesis will be written as :
• Ho: µ = 25
• H1: µ ≠ 25
• The number of tails is 2 as we want to asses whether the mean strength has changed

2. We want to evaluate whether a new incentive scheme has increased the mean daily production of the
company.
• The historical mean is µo. In the null hypothesis, we will assume that the mean production level has not
changed.
• Therefore the null and alternate hypothesis would be written as
• Ho: µ = µo
• H1: µ > µo
• The number of tails =1 (right tail) as we want to assess whether the mean production has increased.

www.invensislearning.com
42
3.3 Hypothesis Testing
Examples of the Null Hypothesis and Alternate Hypothesis

3. A company has appointed a new courier service. They wish to assess whether the package is delivered
faster than before.
• In the Null hypothesis, we will assume that the mean delivery time µo has not changed; the null and
alternate hypothesis will, therefore, be written as
• Ho: µ = µo
• H1: µ < µo
• The number of tails = 1 (Left tail) as we want to assess whether the mean service time has reduced.

www.invensislearning.com
43
3.3 Hypothesis Testing with Normal Data
3.3.1 General Concepts and Goals of Hypothesis Testing

Null Hypothesis vs. Alternate Hypothesis


The conceptual differences between a null and an alternate hypothesis are as follows:

Measure
ment
System
Variation
Null Hypothesis Alternate Hypothesis

• Represented as H0 • Represented as Ha
• Cannot be proved, only rejected • Challenges the null hypothesis
• Example: Movie is good • Example: Movie is not good

If the null hypothesis is rejected, the alternative hypothesis must be right.

www.invensislearning.com
44
3.3 Hypothesis Testing
3.3.1 General Concepts and Goals of Hypothesis Testing

What is Confidence Interval?


In Statistics confidence intervals are of 3 types first is 95% Confidence Interval, second is 90% CI,
and Third is 99% CI ; by default, it is always 95% CI, but you can have 90 and 99% CI also. The
statistical Term alpha is derived as α is = 1-0.95 or 1-0.99 or 1-0.90. We can calculate the
Confidence Interval using formulas given in statistics.

www.invensislearning.com
45
3.3 Hypothesis Testing
3.3.1 General Concepts and Goals of Hypothesis Testing

For Example :- Suppose an Estimate is needed for the average coating thickness for a population of 1000
circuit boards received from a supplier. Rather than measure the coating thickness on all 1000 boards one
might randomly pick up 36 boards for measurement. Suppose the average coating thickness of these 36
boards is 0.003, and the standard deviation of the 36 coating measurements is 0.0005. The standard deviation
is assumed known from past experience. Determine the 95% confidence interval for the true mean.
• From the Z table as the sample size is greater than 30 we use Z Table available in goggle search we know
that Zα/2 = 1.96 also we have
• α = 0.05 , X Bar = 0.003 , σ = 0.0005 , n=36
• We will use the statistical formula to calculate Confidence Interval which is given below:-
• XBar – Zα/2 * σ/ sqrt of n ≤ μ ≤ X bar + Zα/2 * σ/ Sqrt of n
• Substituting the values in the formula we obtain
• 0.003-(1.96) * 0.0005/ Sqrt of 36 ≤ μ ≤ 0.003 + 1.96 * 0.0005/ Sqrt of 36
• 0.00284 ≤ μ ≤ 0.00316
• Thus the 95% confidence interval for the mean is (0.00284,0.00316)

www.invensislearning.com
46
3.3 Hypothesis Testing
3.3.2 Significance; Practical vs. Statistical

Comparing Two Situations – Asking “Are they different?”


Ho: Null Hypothesis – There is no difference Ha: Alternate Hypothesis – There is a difference

Determine Hypothesis Hypothesis is usually


stated as “no difference”

Calculate the Test type: Depends on what you want to


P-value know

Cannot NO YES
Reject Reject
Null P value<.05 ? Null
Hypothesis Hypothesis

No statistical evidence for a difference Statistical evidence for a difference

www.invensislearning.com
47
3.3 Hypothesis Testing
3.3.2 Significance; Practical vs. Statistical

Truth
Ho Ha
Truth Table
Type II Error
Type II
Do Not Correct Error ,b You do not reject Ho
Reject Ho Decision Or Consumer risk when Ha is true

Type I Error
Type I
Error, a Correct You reject Ho
Or Decision when Ho is true
Reject Ho Producer risk

The P-value is the probability of making a Type I error. When a = 0.05 then P-value < 0.05 is our judgment criterion.
We say that the decision is made at the 95% (1-a) confidence level.

www.invensislearning.com
48
3.3 Hypothesis Testing
3.3.3 Risk; Alpha & Beta

Alpha risk is the risk of incorrectly deciding to reject the null hypothesis. If the confidence interval is 95%,
then the alpha risk is 5% or 0.05.
Alpha risk is also called False Positive and Type I Error.

Confidence Level = 1 - Alpha Risk


Alpha is called the significance level of a test. The level of significance is commonly between 1% or 10%
but can be any value depending on your desired level of confidence or need to reduce Type I error.
Selecting 5% signifies that there is a 5% chance that the observed variation is not actually the truth.

www.invensislearning.com
49
3.3 Hypothesis Testing
3.3.3 Risk; Alpha & Beta

Beta risk is the risk that the decision will be made that the part is not defective when it really is.

If the power desired is 90%, then the Beta risk is 10%.

There is a 10% chance that the decision will be made that the part is not defective when in reality it is defective.

Power = 1 - Beta risk


Beta risk is also called False Negative and Type II Error.

Power is the probability of correctly rejecting the Null Hypothesis.

The Null Hypothesis is technically never proven true. It is "failed to reject" or "rejected.“

"Failed to reject" does not mean accept the null hypothesis since it is established only to be proven false by testing
the sample of data.

www.invensislearning.com
50
Hypothesis Testing Possible Scenarios

• During Analyse Phase, to establish statistical significance for the estimation of mean, variance, etc. for the
population from two or multiple samples (for Y)

• Take two or more samples for the Y data from the population and conduct appropriate test(s) to draw inferences
about the population

• During Analyse Phase, to establish statistical significance for the estimation of mean, variance, etc. for the
population from one sample (for X and Y)

• Take one sample for the X and Y data from the respective populations and conduct appropriate test(s) to draw
inferences about the populations

• During Analyse phase, study or establish a correlation between X and Y

• This helps in understanding which X has a max impact on Y and therefore shortlist critical Xs

• During Improve phase, repeat the appropriate tests above to verify and confirm process improvements

www.invensislearning.com
51
3.3 Hypothesis Testing
3.3.4 Types of Hypothesis Testing

There are 2 Types of Hypothesis Testing


• Parametric Hypothesis testing
• Non Parametric Hypothesis Testing
Parametric Hypothesis Testing focusses on the Standard Deviation and the Mean of the
Sample and Non Parametric Hypothesis Testing Focusses on the Median

www.invensislearning.com
3.4 Hypothesis Testing with
Normal Data

www.invenislearning.com
53
3.4 Hypothesis Testing with Normal Data
Examples of Parametric Hypothesis Testing

• 1-Sample T Test (Mean v/s Target) this test is used to compare the mean of a process with a target value such as an ideal
goal mean to determine whether they
• 1 Sample Standard Deviation This test is used to compare the standard deviation of the process with a target value such
as a benchmark whether they differ often used to evaluate how consistent a process is
• 2 Sample T (Comparing 2 Means) Two sets of different items are measured each under a different condition there the
measurements of one sample is independent of the measurements of another sample.
Example of 2 sample T Test is two populations two samples from this test we can find the average expenditure of the
male customer if it is equal to the average expenditure of the female customer.
• Paired T The same set of items are measured under 2 different conditions; therefore, the 2 measurements of the same
item are dependent or related to each other.
• 2-Sample Standard This test is used when comparing 2 standard deviations of samples
• Standard Deviation test This Test is used when comparing more than 2 standard deviations of samples to be compared.

www.invensislearning.com
54
3.4 Hypothesis Testing with Normal Data
Examples of Parametric Hypothesis Testing

• Generally, z-tests are used when we have large sample sizes (n > 30), whereas t-tests are most helpful with a smaller
sample size (n < 30). Both methods assume a normal distribution of the data, but the z-tests are most useful when
the standard deviation is known.
• A T test is usually done to compare the means of two treatments for instance if we want to compare to compare the
performance of a machine before some adjustments are performed on it and the performance after the adjustments
are performed , the mean of one sample of products taken prior to adjustments can be compared to the mean of
another sample taken after adjustment. In that case, a t-test can be useful.

www.invensislearning.com
55
3.4 Hypothesis Testing with Normal Data
Examples of Parametric Hypothesis Testing

• The hypothesis testing performed based on t-test is conducted using the degree of freedom and the confidence
level, but when two sample means are being compared, there is always a room for making an error. If alpha = 0.05
there would be a 5% chance of rejecting a null hypothesis that happens to be true. If for instance, three sample
means A,B,C are being compared using the t-test with a confidence interval of 95% two factors are compared at a
time.
• A is compared with B, then A with C and then b with C. Every time two factors are being compared there are 0.05
probabilities for rejecting a true null hypothesis . Therefore when are three factors are compared using the t-test
the type of making Type I error is inflated. In order to limit the chances of making a Type I Error inflation , we can
use analysis of variance (ANOVA).
• ANOVA is a hypothesis test when more than two factor means are being compared.

www.invensislearning.com
56
3.4 Hypothesis Testing with Normal Data
3.4.1 1 & 2 Sample t-tests

1-Sample t-test
• Use 1-Sample t to estimate the mean of a population and to compare it to a target value or a reference
value when you do not know the standard deviation of the population. Using this analysis, you can do the
following: Determine whether the population mean differs from the hypothesized mean that you specify.
• Calculate a range of values that is likely to include the population mean.
• For example, a quality analyst uses a 1-sample t-test to determine whether the average thread length of
bolts differs from the target of 20 mm. If the mean differs from the target, the analyst uses the confidence
interval to determine how large the difference is likely to be and whether that difference has practical
significance.
• Where to find this analysis
• To perform a 1-sample t-test, choose Stat > Basic Statistics > 1-Sample t.

www.invensislearning.com
57
3.4 Hypothesis Testing with Normal Data
3.4.1 1 & 2 Sample t-tests

Application of 1 Sample t-test


An economist wants to determine whether the monthly energy cost for families had changed from the
previous year when the mean cost per month was $200. The economist randomly samples 25 families and
records their energy costs for the current year.
The economist performs a 1-sample t-test to determine whether the monthly energy cost differs from $200.
• Open the sample data, Family Energy Cost.MTW.
• Choose Stat > Basic Statistics > 1-Sample t.
• From the drop-down list, select One or more samples, each in a column and enter Energy Cost.
• Select Perform hypothesis test.
• In the Hypothesized mean, enter 200.
• Click OK.

www.invensislearning.com
58
3.4 Hypothesis Testing with Normal Data
3.4.1 1 & 2 Sample t-tests

1-Sample t Minitab Output


Interpret the results
• The null hypothesis states that the mean of the energy costs is $200.
Because the p-value is 0.000, which is less than the significance level of
0.05, the economist rejects the null hypothesis and concludes that the
average monthly energy cost for families differs from $200. The 95% CI
indicates that the population mean is likely to be greater than $200.

www.invensislearning.com
59
3.4 Hypothesis Testing with Normal Data
3.4.1 1 & 2 Sample t-tests

Comparison of Means of Two Processes


• Means of two processes are compared to:
• Understand the significant difference in the outcome of the two processes;
• Understand whether a new process is better than an old process;
• Understand whether the two samples belong to the same population or a different population; and
• Benchmark the existing process with another process.

www.invensislearning.com
60
3.4 Hypothesis Testing with Normal Data
3.4.1 1 & 2 Sample t-tests

2t - test
The average heights of men in two different sets of people are compared to see if the means are significantly different.
For this test, the sample sizes, means and variances are required to calculate the value of t. Two samples of sizes n1 of
125 and n2 of 110 are taken from the two populations. The mean value of sample size 1 is 167.3 and sample size 2 is
165.8. The standard deviation for sample sizes 1 and 2 are 4.2 and 5.0 respectively.

www.invensislearning.com
61
3.4 Hypothesis Testing with Normal Data
3.4.1 1 & 2 Sample t-tests

Paired Comparison Hypothesis Test for Means (Theoretical)


The two-mean t-test with unequal variances is:
• H0: μ1 = μ2 against Ha: μ1≠μ2
• Two samples of sizes n1 = 125 and n2 = 110 are taken from the two populations
• X1 = 167.3, X2 = 165.8, s1 = 4.2, s2 = 5.0 are the sample means and SDs respectively
• Compute test statistic

• Reject H0 at the level of significance α if |Computed t|> tDF,α/2


• Since t223, 0.025 = 1.96, the null hypothesis is rejected at 5% level of significance

www.invensislearning.com
62
3.4 Hypothesis Testing with Normal Data
3.4.2 1 Sample Variance

Hypothesis Test for 2 Variance test – Example


Susan is trying to compare the standard deviation of two companies. According to her, the earnings of Company A are
more volatile than those of Company B. She has been obtaining earnings data for the past 31 years for Company A, and
for the past 41 years for Company B. She finds that the sample standard deviation of Company A’s earnings is $4.40 and
of Company B’s earnings is $3.90. Determine whether the earnings of Company A have a greater standard deviation than
those of Company B at 5% level of significance.

www.invensislearning.com
63
3.4 Hypothesis Testing with Normal Data
3.4.2 1 Sample Variance

Hypothesis Test for Equality of Variance – F-test Example


The degrees of freedom for company A and company B are:
• dfA (degrees of freedom of A) = 31 – 1 = 30
• dfB (degrees of freedom of B) = 41 – 1 = 40
The critical value from F-table equals 1.74. The null hypothesis is rejected if the F-test statistic is greater than 1.74.

Calculation of F-test statistic: F= (SA2/S 2) = 4.402/3.902 = 1.273

Results: The F-test statistic (1.273) is not greater than the critical value (1.74). Therefore, at 5% significance level,
the null hypothesis cannot be rejected.

www.invensislearning.com
64
3.4 Hypothesis Testing with Normal Data
3.4.3 One Way Anova

• A chemical engineer wants to compare the hardness of four blends of paint. Six samples of each paint blend
were applied to a piece of metal. The pieces of metal were cured. Then each sample was measured for hardness.
In order to test for the equality of means and to assess the differences between pairs of means, the analyst uses
one-way ANOVA with multiple comparisons.
• Open the sample data, Paint Hardness. MTW.
• Choose Stat > ANOVA > One-Way.
• Select Response data are in one column for all factor levels.
• In Response, enter Hardness.
• In Factor, enter Paint.
• Click the Comparisons button, then select Tukey
• Click OK in each dialog box.

Continued on next slide

www.invensislearning.com
65
3.4 Hypothesis Testing with Normal Data
3.4.3 One Way Anova

One Way Anova Minitab Output

www.invensislearning.com
66
3.4 Hypothesis Testing with Normal Data
3.4.3 One Way Anova

ANOVA- Test for equal variances

www.invensislearning.com
67
3.4 Hypothesis Testing with Normal Data
3.4.3 One Way Anova

ANOVA- Test for equal variances

www.invensislearning.com
68
3.4 Hypothesis Testing with Normal Data
3.4.3 One Way Anova

ANOVA- Test for equal variances

Test for Equal Variances: Hardness vs Paint


Bartlett’s Test

Blend 1 P-Value 0.441

Blend 2
Paint

Blend 3

Blend 4

0 5 10 15 20
95% Bonferroni Confidence Intervals for StDevs

www.invensislearning.com
69
3.4 Hypothesis Testing with Normal Data
3.4.3 One Way Anova

ANOVA- Test for equal variances

www.invensislearning.com
70
3.4 Hypothesis Testing with Normal Data
3.4.3 One Way Anova

One Way Anova Interpretation


The p-value for the paint hardness ANOVA is less than 0.05. This result indicates that the hardness of the paint
blends differs significantly. The engineer knows that some of the group means are different.

www.invensislearning.com
3.5 Hypothesis Testing with
Non-Normal Data

www.invenislearning.com
72
3.5 Hypothesis Testing with Non-Normal Data
Non-Parametric Hypothesis Test

• Non Parametric tests are used when data are Not Normal examples of Non parametric tests
which focusses on the median are given below
• Mann-Whitney
• Kruskal Wallis
• Moods Median
• Friedman
• 1 Sample Sign
• 1 Sample Wilcoxon
• One and Two Sample Proportion
• Chi Square tests

www.invensislearning.com
73
3.5 Hypothesis Testing with Non-Normal Data
3.5.1 Mann-Whitney Test

Mann-Whitney Test Example


• A state highway department uses two brands of paint for painting stripes on roads. A highway official
wants to know whether the durability of the two brands of paint are different. For each paint, the
official records the number of months the paint persists on the highway.
• The official performs a Mann-Whitney test to determine whether the median number of months that
the paint persists differs between the two brands.
• Open the sample data, Highway Paint.MTW.
• Choose Stat > Non Parametrics > Mann-Whitney.
• In First Sample enter Brand A.
• In Second Sample, enter Brand B.
• Click OK.

www.invensislearning.com
74
3.5 Hypothesis Testing with Non-Normal Data
3.5.1 Mann-Whitney Test

Interpretation using P Values of Mann-Whitney Test


The null hypothesis states that the difference in the median
number of months that the paint persists between the two
brands is 0. Because the p-value is 0.0019, which is less than the
significance level of 0.05, the official rejects the null hypothesis.
The official concludes that the difference in the median number
of months the paints persists between the two brands is not 0.
The 95.5 Percent CI indicates that the population median of
Brand B is likely to be greater than Brand A.

www.invensislearning.com
75
3.5 Hypothesis Testing with Non-Normal Data
3.5.2 Kruskal-Wallis Test

The Kruskal-Wallis test is also a non-parametric test used for testing the source of origin of the samples.
Characteristics of the Kruskal-Wallis test are as follows:
• The only way to analyze the variance by ranks.
• Medians of two or more samples are compared to find the source of origin of the sample.
• Unlike the analogous one-way analysis of variance, it does not assume the normal distribution of the residuals.

• The Null hypothesis is when medians of all the groups are equal, and
• The Alternative hypothesis is when at least one population median of one group is different than
that of at least one other group.

www.invensislearning.com
76
3.5 Hypothesis Testing with Non-Normal Data
3.5.2 Kruskal-Wallis Test

Example of Kruskal-Wallis Test and Mood’s Median Test

A health administrator wants to compare the number of unoccupied beds for three hospitals in the same city. The administrator
randomly selects 11 different days from the records of each hospital and enters the number of unoccupied beds for each day.

To determine whether the median number of unoccupied beds differs, the administrator uses the Kruskal-Wallis test.

1. Open the sample data, HospitalBeds.MTW.


2. Choose Stat > Nonparametrics > Kruskal-Wallis.
3. In Response, enter Beds.
4. In Factor, enter Hospital.
5. Click OK.

www.invensislearning.com
77
3.5 Hypothesis Testing with Non-Normal Data
3.5.2 Kruskal-Wallis Test

Interpret the results


The sample medians for the three hospitals are 16.00, 31.00, and 17.00. The average ranks show that hospital 2 differs the
most from the average rank for all observations and that this hospital is higher than the overall median.
Both p-values are less than 0.05. The p-values indicate that the median number of unoccupied beds differs for at least one
hospital.

www.invensislearning.com
78
3.5 Hypothesis Testing with Non-Normal Data
3.5.2 Kruskal-Wallis Test

Kruskal-Wallis Test: Beds versus Hospital


Descriptive Statistics

Hospital N Median Mean Rank Z-Value

1 11 16 14.0 -1.28

2 11 31 23.3 2.65

3 11 17 13.7 -1.37

Overall 33 17.0

www.invensislearning.com
79
3.5 Hypothesis Testing with Non-Normal Data
3.5.2 Kruskal-Wallis Test

Kruskal-Wallis Test: Beds versus Hospital


Test

Null hypothesis H₀: All medians are equal


Alternative hypothesis H₁: At least one median is different
Method DF H-Value P-Value
Not adjusted for ties 2 7.05 0.029
Adjusted for ties 2 7.05 0.029

www.invensislearning.com
80
3.5 Hypothesis Testing with Non-Normal Data
3.5.2 Kruskal-Wallis Test

Mood Median Test: Beds versus Hospital

Mood median test for Beds


Chi-Square = 7.52 DF = 2 P = 0.023

Individual 95.0% CIs


Hospital N≤ N> Median Q3-Q1 -----+---------+---------+---------+-
1 7 4 16.0 23.0 (---------*------------)
2 2 9 31.0 12.0 (----*-------)
3 8 3 17.0 24.0 (-----------*-----------)
-----+---------+---------+---------+-
10 20 30 40

Overall median = 24.0

www.invensislearning.com
81
3.5 Hypothesis Testing with Non-Normal Data
3.5.3 Mood’s Median Test

The Mood’s median is a non-parametric test that is used to test the equality of medians from two or
more different populations. This test works when:
• The output (Y) variable is continuous, discrete-ordinal or discrete-count, and
• The input (X) variable is discrete with two or more attributes.

The steps involved in Mood’s Median test are as follows:

Find the median of


the combined
data set Find the number
of values in each
sample > median Form a
contingency
table Find expected
value for each
cell Find chi-square
value

www.invensislearning.com
82
3.5 Hypothesis Testing with Non-Normal Data
3.5.4 Friedman Test

Friedman test is a form of non-parametric test that does not make any assumptions on the shape and
origin of the sample.
• It allows smaller sample data sets to be analysed, and
• Unlike ANOVA, it does not require the dataset to be randomly sampled from normally distributed
populations with equal variances.
Note: The test uses the null hypothesis where the population medians of each treatment are statistically
identical to the rest of the group.

www.invensislearning.com
83
3.5 Hypothesis Testing with Non-Normal Data
3.5.5 1 Sample Sign Test

The 1 Sample Sign test is the simplest of all the non-parametric tests that can be used instead of a
one sample t test.
• Here, H0 is the hypothecated median or assumed median of the sample, which belongs to the
Population.
Steps involved in 1 Sample Sign test are as follows:

Count the number of positive Count the number of


Test the values
values negative values

Values that are larger than Values that are smaller than Check if there are significantly
hypothesized median the hypothesized median more positives (or negatives)
than expected

www.invensislearning.com
84
3.5 Hypothesis Testing with Non-Normal Data
3.5.6 1 Sample Wilcoxon Test

The 1 Sample Wilcoxon test also known as the Wilcoxon Signed Rank test is a non-parametric test.
This test is:
• Equivalent to parametric One Sample t-Test, and
• Powerful than non-parametric 1 Sample Sign Test.

www.invensislearning.com
85
3.5 Hypothesis Testing with Non-Normal Data
3.5.6 1 Sample Wilcoxon Test

Characteristics of 1 Sample Wilcoxon Test


Some characteristics of this test are as follows:
• It assumes the existing sample is randomly taken from a population, with a symmetric
frequency distribution around the median, and
• The symmetry can be observed with a histogram, or by checking if the median and mean are
approximately equal.

The conclusion in this test is that if the value is on the mid-point, you can continue
and accept the null hypothesis. If not, reject the alternate hypothesis.

www.invensislearning.com
86
3.5 Hypothesis Testing with Non-Normal Data
3.5.6 1 Sample Wilcoxon Test

1 Sample Wilcoxon Test - Example


An example of Sample Wilcoxon test is shown.
The Median customer satisfaction score of an organization has always been 3.7 and the management wants to
see if this has changed. They conducted a survey and got the results grouped by the customer type.
Conclusion:
• If median = 3.7 = Accept Ho
• If median ≠ 3.7 = Reject Ho
• α = 0.05

www.invensislearning.com
87
3.5 Hypothesis Testing with Non-Normal Data
3.5.7 One and Two Sample Proportion Test

One and Two Sample Proportion


1. Proportion Test: Analyze difference in a sample proportion and target
2. Proportion Test: Analyze difference in two sample, independent, proportions

www.invensislearning.com
88
3.5 Hypothesis Testing with Non-Normal Data
3.5.7 One and Two Sample Proportion Test

One and Two Sample Proportion

www.invensislearning.com
89
3.5 Hypothesis Testing with Normal Data
3.5.7 One and Two Sample Proportion Test

Example of Hypothesis Test-1 Proportion


A marketing analyst wants to determine whether mailed advertisements for a new product result in a response
rate different from the national average. A random sample of 1000 households is chosen to receive
advertisements. Of the 1000 households sampled, 87 make a purchase after receiving the advertisement.
The analyst performs a 1 proportion test to determine whether the proportion of households that made a
purchase is different from the national average of 6.5%.
1. Choose Stat > Basic Statistics > 1 Proportion.
2. From the drop-down list, select Summarized data.
3. In Number of events, enter 87.
4. In Number of trials, enter 1000.
5. Select Perform hypothesis test.
6. In Hypothesized proportion, enter 0.065.
7. Click OK.

www.invensislearning.com
90
3.4 Hypothesis Testing with Normal Data
3.5.7 One and Two Sample Proportion Test

Interpretation of 1 Sample Proportion Test


• The null hypothesis states that the proportion of households
that make a purchase equals 0.065. Because the p-value is
0.008, which is less than the significance level of 0.05, the
analyst rejects the null hypothesis. The results indicate that
the proportion of households that make a purchase is
different from the national average of 6.5%.

www.invensislearning.com
91
3.5 Hypothesis Testing with Non-Normal Data
3.5.8 Chi-Square Distribution

The Chi-square distribution (χ²-distribution) or Chi-squared:


• Is a widely used probability distribution in inferential statistics;
• Needs one sample for the test to be conducted; and
• With k-1 degrees of freedom is the distribution of a sum of the squares of k independent standard
normal random variables.

𝒳 2 f0 −fe 2
Calculated = Σ
fe
Where,
• 𝒳2 = chi-square index
Calculated
• Fo = An observed frequency
• Fe = An expected frequency

www.invensislearning.com
92
3.5 Hypothesis Testing with Non-Normal Data
3.5.8 Chi-Square Distribution

Chi-Square Test - Example


To analyze the Australian hockey team’s wins, the data has
two classifications:
• The table is called a 2 X 4 contingency table.
• Expected frequency for each of the observed
frequencies = (row total)(column total)/overall total.
Estimated Population
Sample Statistics
Parameters
Example: Observed frequency of 3 wins against South
Africa in Australia would convert to the expected 92

frequency of (21 / 31) * 5 = 3.39

Australian hockey team wishes to analyze its wins at


home and abroad against four different countries.

www.invensislearning.com
93
3.5 Hypothesis Testing with Non-Normal Data
3.5.8 Chi-Square Distribution

The table is populated by:


• Calculating and adding the estimated population parameters;
• Estimating the observed frequency; and
• Calculating the final chi-square index.

www.invensislearning.com
94
3.5 Hypothesis Testing with Normal Data
3.5.8 Chi-Square Distribution

H0: Proportion of wins in Australia or abroad is independent of the country played against
Ha: Proportion of wins in Australia or abroad is dependent on the country played against
χ2 Critical = 6.251 and
χ2 Calculated = 1.36
Result: Since calculated value is less than the critical value, the proportion of wins of Australia
hockey team is independent of the country played or place.

www.invensislearning.com
95
3.5 Hypothesis Testing with Non-Normal Data
3.5.8 Chi-Square Distribution

Chi-Square Test – Example: Interpretation of Results


There is a different chi-square distribution for each different number of degrees of freedom.
For chi- square distribution, degrees of freedom are calculated as per the number of rows and
columns in the contingency table.
The purpose of the Chi-square test is to test the hypothesis
H0 = The data follow a specified distribution
HA = The data do not follow a specified distribution

www.invensislearning.com
96
3.5 Hypothesis Testing with Non-Normal Data
3.5.8 Chi-Square Distribution

To conduct the Chi-square test the steps are given below:


1. State the null and alternative hypothesis (H0, HA).
2. Arrange a random sample of size n into a frequency histogram of k class intervals.
3. Determine Oi = observed frequency in the ith class interval.
4. Determine Ei = expected frequency in the ith class interval using the hypothesis distribution.
5. State the α value. k
(Oi – Ei)2
6. Compute the test statistic X20 = ⅀
i=1 Ei

7. Compute the critical value. Reject H0 if X20 > X2a,k-p-1. The value of p is the number of
parameters estimated.
8. State the conclusion of the test.

www.invensislearning.com
97
3.5 Hypothesis Testing with Non-Normal Data
3.5.8 Chi-Square Distribution

Chi-Square Test for Association

www.invensislearning.com
98
3.5 Hypothesis Testing with Non-Normal Data
3.5.8 Chi-Square Distribution

Chi-Square Test for Association

www.invensislearning.com
99
3.5 Hypothesis Testing with Non-Normal Data
3.5.8 Chi-Square Distribution

Chi-Square Test for Association

www.invensislearning.com

You might also like