Six Sigma Green Belt 3.ANALYSE (IASSC)
Six Sigma Green Belt 3.ANALYSE (IASSC)
Six Sigma Green Belt 3.ANALYSE (IASSC)
www.invenislearning.com
3.0 Analyze Phase
www.invenislearning.com
3.1 Patterns of Variation
www.invenislearning.com
4
Lean Tools - Value Add (VA) and Non-Value Add
(NVA) Analysis
• The objective of the VA/NVA analysis is to:
• Identify and eliminate the hidden costs that do not add value to the customer
• Reduce unnecessary process complexity, and thus errors
• Method:
• Classify each process step as value-added (also known as "customer value-add"), business non-value-add
(sometimes called "required waste"), and non-value-add
• Add up the time spent in each category
• Decide what to do next.
• Value-add tasks should be optimized and standardized
• Business non-value-add tasks should be checked with the customer and, where possible, minimized or
eliminated
• Non-value-add activities should be eliminated
www.invensislearning.com
5
Lean Tools - Value Add (VA) and Non-Value Add
(NVA) Analysis
• Value-Added or Customer Value –Added
• Must be performed to meet customer needs
• Adds form or feature to the service
• Enhances service quality, enables on-time or more competitive delivery or has a positive impact on
price competition
• Customers would be willing to pay for this work if they knew you were doing it
• Non-Value-Added
• Rework, Duplicating, waiting, etc.
• Business Value-Added
• Internal Requirements i.e. compliance
www.invensislearning.com
6
Lean Tools - Value Add (VA) and Non-Value Add
(NVA) Analysis
• Lead Time
• The time between order and delivery
• Cycle Time C/T
• The time taken at each step to create a
product/service element
• Takt Time
• Customer demand rate
• Process Time P/T
• The time taken to produce one item when
one operator is working on a product at a
time – it equals C/T (in case of batch
processing C/T = (P/T) / no. of items
produced
www.invensislearning.com
7
Lean Tools - Value Add (VA) and Non-Value Add
(NVA) Analysis
Value stream pinpoints value add and non value add activities
Production Sales
Suppliers Customer
Forecasts Demand
Forecasts Forecasts
I I I I I
Components
4 weeks
4 days 3 days 5 days 10 days 42 days
92 minutes
20 min 42 min 10 min 15 min 5 min
www.invensislearning.com
8
Lean Tools - Value Add (VA) and Non-Value Add
(NVA) Analysis
High
variation
Production Sales
Suppliers Customer
Forecasts Demand
High
defect
rate
Forecasts Forecasts
Excessive
inventory
Subassembly Final Assembly Test Stage Ship
I I I I I
Components Long set
4 weeks
up times 42 days
4 days 3 days 5 days 10 days
92 minutes
120 min 42 min 10 min 15 min 5 min
www.invensislearning.com
9
Lean Tools - Value Add (VA) and Non-Value Add
(NVA) Analysis
www.invensislearning.com
10
Takt Time
• Takt Time Calculation
• The takt time is the amount of available work time divided by the customer demand during that time period
• Example:
• Work Schedule: 8 hours/day = Total of 480 minutes in a day
• No. of shipments to handle in a day = 150
• Takt time = 480 (minutes)/150 = One shipment for every 3.2 minutes
• Any VA step in a process map that takes longer than the Takt rate is considered a time trap
• Divide the total time for the process by Takt time to get a rough estimate the staff requires to operate the process
www.invensislearning.com
11
3.1 Patterns of Variation
3.1.1 Multi-Vari Analysis
Multi-Vari Studies
Multi-Vari studies analyze variation, investigate process stability, identify investigation areas, and
break down the variation.
They classify variation sources into three major types:
• Use Multi-Vari Chart as a preliminary tool to investigate variation in your data, including cyclical
variations and interactions between factors.
• A multi-vari chart provides a graphical representation of the relationships between factors and a
response.
• The multi-vari chart displays the means at each factor level for every factor. In Minitab, each multi-vari
chart can display up to four factors.
• For Example, a manufacturer produces plastic pipes using two different machines with three
temperature settings. The quality engineer is concerned about the consistency of pipe diameters from
the different machines and settings. The engineer creates a multi-vari chart to investigate the variation in
pipe diameters.
www.invensislearning.com
13
3.1 Patterns of Variation
3.1.1 Multi-Vari Analysis
www.invensislearning.com
14
3.1 Patterns of Variation
3.1.1 Multi-Vari Analysis
www.invensislearning.com
15
3.1 Patterns of Variation
3.1.1 Multi-Vari Analysis
www.invensislearning.com
16
3.1 Patterns of Variation
3.1.1 Multi-Vari Analysis
www.invensislearning.com
17
3.1 Patterns of Variation
3.1.2 Classes of Distributions
The data obtained from the measurement phase exhibits a variety of distribution, depending on the data type
and its source.
The methods used to describe the parameters for classes of distribution are:
www.invensislearning.com
18
3.1 Patterns of Variation
3.1.2 Classes of Distributions
Types of Distributions
The two types of distribution are as follows:
www.invensislearning.com
19
3.1 Patterns of Variation
3.1.2 Classes of Distributions
Binomial Distribution
The binomial distribution is a probability distribution for discrete data.
Characteristics of
Binomial Distribution
Predicts sample behavior Describes the discrete data as a
result of a particular process
P(R) = n Cr ∗ pr ∗ 1 − pnr ( ) -
where, P(R) = probability of exactly (r) successes out of a sample size of (n)
www.invensislearning.com
20
3.1 Patterns of Variation
3.1.2 Classes of Distributions
Binomial Distribution
Some of the key calculations of binomial distribution are shown.
Term Formula
𝜇 = 𝑛𝑝
Mean where, n = sample size
p = probability of success
𝜎 = 𝑛𝑝(1 − 𝑝)
Standard Deviation where, n = sample size
p = probability of success
www.invensislearning.com
21
3.1 Patterns of Variation
3.1.2 Classes of Distributions
www.invensislearning.com
22
3.1 Patterns of Variation
3.1.2 Classes of Distributions
Poisson Distribution
Poisson distribution is an application of the population knowledge to predict the sample behaviour.
Characteristics of
Deals with integers which can take any value
Poisson Distribution
Used where the probability of success in
each trial is very small
www.invensislearning.com
23
3.1 Patterns of Variation
3.1.2 Classes of Distributions
λx ∗ e−λ
P (∗) =
∗!
where, P(x) = probability of exactly (∗) occurrences in a Poisson distribution (n)
λ = mean number of occurrences during interval
∗ = number of occurrences desired
e = base of the natural logarithm (equals 2.71828)
www.invensislearning.com
24
3.1 Patterns of Variation
3.1.2 Classes of Distributions
P(X<2) = 0.078
www.invensislearning.com
25
3.1 Patterns of Variation
3.1.2 Classes of Distributions
www.invensislearning.com
26
3.1 Patterns of Variation
3.1.2 Classes of Distributions
Normal Distribution
The Normal or Gaussian distribution is a continuous probability
distribution, illustrated as N (µ, σ).
• It has a higher frequency of values around the mean and
fewer occurrences away from it.
• It is used as a first approximation to describe real-valued
random variables that tend to cluster around a single mean
value.
• It is a bell-shaped curve and is symmetrical. Normal Distribution with Mean = 100 and Standard Deviation = 10
• The total area under the normal curve p(x which is found in
the distribution) = 1.
www.invensislearning.com
27
3.1 Patterns of Variation
3.1.2 Classes of Distributions
Z =
(Y − µ)
σ
where Z = number of standard deviations between Y and the µ
Y = value of the data point in concern
µ = mean of the population
σ = standard deviation of the population
www.invensislearning.com
28
3.1 Patterns of Variation
3.1.2 Classes of Distributions
Q Suppose the time taken to resolve customer problems follows a normal distribution with the mean
value of 250 hours and standard deviation value of 23 hrs. What is the probability that a problem
resolution will take more than 300 hrs?
A Given:
● Y = 300
● µ = 250
● σ = 23
(300−250)
Using the formula: Z = =2.17
23
● From a Normal Distribution Table, the Z value of 2.17 covers an area of 0.98499 under itself
● Thus, the probability that a problem can be resolved in less than 300 hrs is 98.5%
● The chances of a problem resolution taking more than 300 hours is 1.5%
www.invensislearning.com
29
3.1 Patterns of Variation
3.1.2 Classes of Distributions
Q Suppose the time taken to resolve customer problems follows a normal distribution with the mean
value of 250 hours and standard deviation value of 23 hrs. What is the probability that a problem
resolution will take more than 300 hrs?
A Given:
● Y = 300
● µ = 250
● σ = 23
(300−250)
Using the formula: Z = =2.17
23
● From a Normal Distribution Table, the Z value of 2.17 covers an area of 0.98499 under itself
● Thus, the probability that a problem can be resolved in less than 300 hrs is 98.5%
● The chances of a problem resolution taking more than 300 hours is 1.5%
www.invensislearning.com
30
3.1 Patterns of Variation
3.1.2 Classes of Distributions
Chi-Square Distribution
If we obtain a random sample X1, X2, …., Xn of size n from a population that is normally distributed with
mean µ a with finite variance σ2, the random variable
(n-1)s2
x2 =
σ2
is distributed as a chi-square distribution with n-1 degrees of freedom where s2 is the sample variance.
The formula for the x2 will be useful later when we discuss hypothesis testing and confidence intervals.
www.invensislearning.com
31
3.1 Patterns of Variation
3.1.2 Classes of Distributions
T-Distribution
A t-distribution is most appropriate to be used when:
• The sample size <30;
• Population standard deviation is not known; and
• Population is approximately normal.
www.invensislearning.com
32
3.1 Patterns of Variation
3.1.2 Classes of Distributions
F-Distribution
The F-distribution is a ratio of two Chi-square distributions, and a specific F-distribution is denoted by the
degrees of freedom for the numerator Chi-square and the degrees of freedom for the denominator Chi-square.
S1
Fcalculated =
S2
Refer F-table to find out critical F-distribution at α and degrees of freedom of samples of two
different processes (df1 and df2)
www.invensislearning.com
3.2 Inferential Statistics
www.invenislearning.com
34
3.2 Inferential Statistics
3.2.1 Understanding Inference
Types of Statistics
Statistics refers to the science of collection, analysis, interpretation, and presentation of data. There are two
major types of statistics-Descriptive statistics and Inferential statistics.
www.invensislearning.com
35
3.2 Inferential Statistics
3.2.1 Understanding Inference
Inferential statistics is a set of methods used to draw conclusions or inferences about characteristics of
populations based on data from a sample. The mean calculated for a population. The standard deviation
calculated for a population. The objective of statistical inference is to draw conclusions about population
characteristics based on the information contained in a sample
www.invensislearning.com
36
3.2 Inferential Statistics
3.2.3 Central Limit Theorem
Central Limit Theorem (CLT) states that for a sample size greater than 30, the sample mean is very
close to the population mean.
• When sample size is greater than 30, the sample mean approaches a normal distribution.
• In such cases, the Standard Error of Mean (SEM) that represents the variability between the
sample means is very less.
Selecting a sample size also depends on the concept called Power of the Test.
www.invensislearning.com
37
3.2 Inferential Statistics
3.2.3 Central Limit Theorem
• CLT aids in making inferences from the sample statistics about the population
parameters irrespective of the distribution of the population.
• CLT becomes the basis for calculating the confidence interval for a hypothesis
test as it allows the use of a standard normal table.
www.invensislearning.com
3.3 Hypothesis Testing
www.invenislearning.com
39
3.3 Hypothesis Testing
3.3.1 General Concepts and Goals of Hypothesis Testing
www.invensislearning.com
40
3.3 Hypothesis Testing
3.3.1 General Concepts and Goals of Hypothesis Testing
Example: Based on the hypothesis test, Nutri Worldwide Inc. implemented a trading strategy.
The returns:
• Are economically significant when logical reasons are examined before implementation.
• May not be significant when the statistically proven strategy is implemented directly.
• May be economically insignificant due to taxes, transaction costs, and risks.
www.invensislearning.com
41
3.3 Hypothesis Testing
Examples of the Null Hypothesis and Alternate Hypothesis
1. A cement plant has found that the historical mean strength of cement is 25 units. The Company wants to
assess whether the mean strength continues to be the same.
• In the Null hypothesis, we will assume that the mean strength
• (25 units) has not changed. Therefore the null and alternate hypothesis will be written as :
• Ho: µ = 25
• H1: µ ≠ 25
• The number of tails is 2 as we want to asses whether the mean strength has changed
2. We want to evaluate whether a new incentive scheme has increased the mean daily production of the
company.
• The historical mean is µo. In the null hypothesis, we will assume that the mean production level has not
changed.
• Therefore the null and alternate hypothesis would be written as
• Ho: µ = µo
• H1: µ > µo
• The number of tails =1 (right tail) as we want to assess whether the mean production has increased.
www.invensislearning.com
42
3.3 Hypothesis Testing
Examples of the Null Hypothesis and Alternate Hypothesis
3. A company has appointed a new courier service. They wish to assess whether the package is delivered
faster than before.
• In the Null hypothesis, we will assume that the mean delivery time µo has not changed; the null and
alternate hypothesis will, therefore, be written as
• Ho: µ = µo
• H1: µ < µo
• The number of tails = 1 (Left tail) as we want to assess whether the mean service time has reduced.
www.invensislearning.com
43
3.3 Hypothesis Testing with Normal Data
3.3.1 General Concepts and Goals of Hypothesis Testing
Measure
ment
System
Variation
Null Hypothesis Alternate Hypothesis
• Represented as H0 • Represented as Ha
• Cannot be proved, only rejected • Challenges the null hypothesis
• Example: Movie is good • Example: Movie is not good
www.invensislearning.com
44
3.3 Hypothesis Testing
3.3.1 General Concepts and Goals of Hypothesis Testing
www.invensislearning.com
45
3.3 Hypothesis Testing
3.3.1 General Concepts and Goals of Hypothesis Testing
For Example :- Suppose an Estimate is needed for the average coating thickness for a population of 1000
circuit boards received from a supplier. Rather than measure the coating thickness on all 1000 boards one
might randomly pick up 36 boards for measurement. Suppose the average coating thickness of these 36
boards is 0.003, and the standard deviation of the 36 coating measurements is 0.0005. The standard deviation
is assumed known from past experience. Determine the 95% confidence interval for the true mean.
• From the Z table as the sample size is greater than 30 we use Z Table available in goggle search we know
that Zα/2 = 1.96 also we have
• α = 0.05 , X Bar = 0.003 , σ = 0.0005 , n=36
• We will use the statistical formula to calculate Confidence Interval which is given below:-
• XBar – Zα/2 * σ/ sqrt of n ≤ μ ≤ X bar + Zα/2 * σ/ Sqrt of n
• Substituting the values in the formula we obtain
• 0.003-(1.96) * 0.0005/ Sqrt of 36 ≤ μ ≤ 0.003 + 1.96 * 0.0005/ Sqrt of 36
• 0.00284 ≤ μ ≤ 0.00316
• Thus the 95% confidence interval for the mean is (0.00284,0.00316)
www.invensislearning.com
46
3.3 Hypothesis Testing
3.3.2 Significance; Practical vs. Statistical
Cannot NO YES
Reject Reject
Null P value<.05 ? Null
Hypothesis Hypothesis
www.invensislearning.com
47
3.3 Hypothesis Testing
3.3.2 Significance; Practical vs. Statistical
Truth
Ho Ha
Truth Table
Type II Error
Type II
Do Not Correct Error ,b You do not reject Ho
Reject Ho Decision Or Consumer risk when Ha is true
Type I Error
Type I
Error, a Correct You reject Ho
Or Decision when Ho is true
Reject Ho Producer risk
The P-value is the probability of making a Type I error. When a = 0.05 then P-value < 0.05 is our judgment criterion.
We say that the decision is made at the 95% (1-a) confidence level.
www.invensislearning.com
48
3.3 Hypothesis Testing
3.3.3 Risk; Alpha & Beta
Alpha risk is the risk of incorrectly deciding to reject the null hypothesis. If the confidence interval is 95%,
then the alpha risk is 5% or 0.05.
Alpha risk is also called False Positive and Type I Error.
www.invensislearning.com
49
3.3 Hypothesis Testing
3.3.3 Risk; Alpha & Beta
Beta risk is the risk that the decision will be made that the part is not defective when it really is.
There is a 10% chance that the decision will be made that the part is not defective when in reality it is defective.
The Null Hypothesis is technically never proven true. It is "failed to reject" or "rejected.“
"Failed to reject" does not mean accept the null hypothesis since it is established only to be proven false by testing
the sample of data.
www.invensislearning.com
50
Hypothesis Testing Possible Scenarios
• During Analyse Phase, to establish statistical significance for the estimation of mean, variance, etc. for the
population from two or multiple samples (for Y)
• Take two or more samples for the Y data from the population and conduct appropriate test(s) to draw inferences
about the population
• During Analyse Phase, to establish statistical significance for the estimation of mean, variance, etc. for the
population from one sample (for X and Y)
• Take one sample for the X and Y data from the respective populations and conduct appropriate test(s) to draw
inferences about the populations
• This helps in understanding which X has a max impact on Y and therefore shortlist critical Xs
• During Improve phase, repeat the appropriate tests above to verify and confirm process improvements
www.invensislearning.com
51
3.3 Hypothesis Testing
3.3.4 Types of Hypothesis Testing
www.invensislearning.com
3.4 Hypothesis Testing with
Normal Data
www.invenislearning.com
53
3.4 Hypothesis Testing with Normal Data
Examples of Parametric Hypothesis Testing
• 1-Sample T Test (Mean v/s Target) this test is used to compare the mean of a process with a target value such as an ideal
goal mean to determine whether they
• 1 Sample Standard Deviation This test is used to compare the standard deviation of the process with a target value such
as a benchmark whether they differ often used to evaluate how consistent a process is
• 2 Sample T (Comparing 2 Means) Two sets of different items are measured each under a different condition there the
measurements of one sample is independent of the measurements of another sample.
Example of 2 sample T Test is two populations two samples from this test we can find the average expenditure of the
male customer if it is equal to the average expenditure of the female customer.
• Paired T The same set of items are measured under 2 different conditions; therefore, the 2 measurements of the same
item are dependent or related to each other.
• 2-Sample Standard This test is used when comparing 2 standard deviations of samples
• Standard Deviation test This Test is used when comparing more than 2 standard deviations of samples to be compared.
www.invensislearning.com
54
3.4 Hypothesis Testing with Normal Data
Examples of Parametric Hypothesis Testing
• Generally, z-tests are used when we have large sample sizes (n > 30), whereas t-tests are most helpful with a smaller
sample size (n < 30). Both methods assume a normal distribution of the data, but the z-tests are most useful when
the standard deviation is known.
• A T test is usually done to compare the means of two treatments for instance if we want to compare to compare the
performance of a machine before some adjustments are performed on it and the performance after the adjustments
are performed , the mean of one sample of products taken prior to adjustments can be compared to the mean of
another sample taken after adjustment. In that case, a t-test can be useful.
www.invensislearning.com
55
3.4 Hypothesis Testing with Normal Data
Examples of Parametric Hypothesis Testing
• The hypothesis testing performed based on t-test is conducted using the degree of freedom and the confidence
level, but when two sample means are being compared, there is always a room for making an error. If alpha = 0.05
there would be a 5% chance of rejecting a null hypothesis that happens to be true. If for instance, three sample
means A,B,C are being compared using the t-test with a confidence interval of 95% two factors are compared at a
time.
• A is compared with B, then A with C and then b with C. Every time two factors are being compared there are 0.05
probabilities for rejecting a true null hypothesis . Therefore when are three factors are compared using the t-test
the type of making Type I error is inflated. In order to limit the chances of making a Type I Error inflation , we can
use analysis of variance (ANOVA).
• ANOVA is a hypothesis test when more than two factor means are being compared.
www.invensislearning.com
56
3.4 Hypothesis Testing with Normal Data
3.4.1 1 & 2 Sample t-tests
1-Sample t-test
• Use 1-Sample t to estimate the mean of a population and to compare it to a target value or a reference
value when you do not know the standard deviation of the population. Using this analysis, you can do the
following: Determine whether the population mean differs from the hypothesized mean that you specify.
• Calculate a range of values that is likely to include the population mean.
• For example, a quality analyst uses a 1-sample t-test to determine whether the average thread length of
bolts differs from the target of 20 mm. If the mean differs from the target, the analyst uses the confidence
interval to determine how large the difference is likely to be and whether that difference has practical
significance.
• Where to find this analysis
• To perform a 1-sample t-test, choose Stat > Basic Statistics > 1-Sample t.
www.invensislearning.com
57
3.4 Hypothesis Testing with Normal Data
3.4.1 1 & 2 Sample t-tests
www.invensislearning.com
58
3.4 Hypothesis Testing with Normal Data
3.4.1 1 & 2 Sample t-tests
www.invensislearning.com
59
3.4 Hypothesis Testing with Normal Data
3.4.1 1 & 2 Sample t-tests
www.invensislearning.com
60
3.4 Hypothesis Testing with Normal Data
3.4.1 1 & 2 Sample t-tests
2t - test
The average heights of men in two different sets of people are compared to see if the means are significantly different.
For this test, the sample sizes, means and variances are required to calculate the value of t. Two samples of sizes n1 of
125 and n2 of 110 are taken from the two populations. The mean value of sample size 1 is 167.3 and sample size 2 is
165.8. The standard deviation for sample sizes 1 and 2 are 4.2 and 5.0 respectively.
www.invensislearning.com
61
3.4 Hypothesis Testing with Normal Data
3.4.1 1 & 2 Sample t-tests
www.invensislearning.com
62
3.4 Hypothesis Testing with Normal Data
3.4.2 1 Sample Variance
www.invensislearning.com
63
3.4 Hypothesis Testing with Normal Data
3.4.2 1 Sample Variance
Results: The F-test statistic (1.273) is not greater than the critical value (1.74). Therefore, at 5% significance level,
the null hypothesis cannot be rejected.
www.invensislearning.com
64
3.4 Hypothesis Testing with Normal Data
3.4.3 One Way Anova
• A chemical engineer wants to compare the hardness of four blends of paint. Six samples of each paint blend
were applied to a piece of metal. The pieces of metal were cured. Then each sample was measured for hardness.
In order to test for the equality of means and to assess the differences between pairs of means, the analyst uses
one-way ANOVA with multiple comparisons.
• Open the sample data, Paint Hardness. MTW.
• Choose Stat > ANOVA > One-Way.
• Select Response data are in one column for all factor levels.
• In Response, enter Hardness.
• In Factor, enter Paint.
• Click the Comparisons button, then select Tukey
• Click OK in each dialog box.
www.invensislearning.com
65
3.4 Hypothesis Testing with Normal Data
3.4.3 One Way Anova
www.invensislearning.com
66
3.4 Hypothesis Testing with Normal Data
3.4.3 One Way Anova
www.invensislearning.com
67
3.4 Hypothesis Testing with Normal Data
3.4.3 One Way Anova
www.invensislearning.com
68
3.4 Hypothesis Testing with Normal Data
3.4.3 One Way Anova
Blend 2
Paint
Blend 3
Blend 4
0 5 10 15 20
95% Bonferroni Confidence Intervals for StDevs
www.invensislearning.com
69
3.4 Hypothesis Testing with Normal Data
3.4.3 One Way Anova
www.invensislearning.com
70
3.4 Hypothesis Testing with Normal Data
3.4.3 One Way Anova
www.invensislearning.com
3.5 Hypothesis Testing with
Non-Normal Data
www.invenislearning.com
72
3.5 Hypothesis Testing with Non-Normal Data
Non-Parametric Hypothesis Test
• Non Parametric tests are used when data are Not Normal examples of Non parametric tests
which focusses on the median are given below
• Mann-Whitney
• Kruskal Wallis
• Moods Median
• Friedman
• 1 Sample Sign
• 1 Sample Wilcoxon
• One and Two Sample Proportion
• Chi Square tests
www.invensislearning.com
73
3.5 Hypothesis Testing with Non-Normal Data
3.5.1 Mann-Whitney Test
www.invensislearning.com
74
3.5 Hypothesis Testing with Non-Normal Data
3.5.1 Mann-Whitney Test
www.invensislearning.com
75
3.5 Hypothesis Testing with Non-Normal Data
3.5.2 Kruskal-Wallis Test
The Kruskal-Wallis test is also a non-parametric test used for testing the source of origin of the samples.
Characteristics of the Kruskal-Wallis test are as follows:
• The only way to analyze the variance by ranks.
• Medians of two or more samples are compared to find the source of origin of the sample.
• Unlike the analogous one-way analysis of variance, it does not assume the normal distribution of the residuals.
• The Null hypothesis is when medians of all the groups are equal, and
• The Alternative hypothesis is when at least one population median of one group is different than
that of at least one other group.
www.invensislearning.com
76
3.5 Hypothesis Testing with Non-Normal Data
3.5.2 Kruskal-Wallis Test
A health administrator wants to compare the number of unoccupied beds for three hospitals in the same city. The administrator
randomly selects 11 different days from the records of each hospital and enters the number of unoccupied beds for each day.
To determine whether the median number of unoccupied beds differs, the administrator uses the Kruskal-Wallis test.
www.invensislearning.com
77
3.5 Hypothesis Testing with Non-Normal Data
3.5.2 Kruskal-Wallis Test
www.invensislearning.com
78
3.5 Hypothesis Testing with Non-Normal Data
3.5.2 Kruskal-Wallis Test
1 11 16 14.0 -1.28
2 11 31 23.3 2.65
3 11 17 13.7 -1.37
Overall 33 17.0
www.invensislearning.com
79
3.5 Hypothesis Testing with Non-Normal Data
3.5.2 Kruskal-Wallis Test
www.invensislearning.com
80
3.5 Hypothesis Testing with Non-Normal Data
3.5.2 Kruskal-Wallis Test
www.invensislearning.com
81
3.5 Hypothesis Testing with Non-Normal Data
3.5.3 Mood’s Median Test
The Mood’s median is a non-parametric test that is used to test the equality of medians from two or
more different populations. This test works when:
• The output (Y) variable is continuous, discrete-ordinal or discrete-count, and
• The input (X) variable is discrete with two or more attributes.
www.invensislearning.com
82
3.5 Hypothesis Testing with Non-Normal Data
3.5.4 Friedman Test
Friedman test is a form of non-parametric test that does not make any assumptions on the shape and
origin of the sample.
• It allows smaller sample data sets to be analysed, and
• Unlike ANOVA, it does not require the dataset to be randomly sampled from normally distributed
populations with equal variances.
Note: The test uses the null hypothesis where the population medians of each treatment are statistically
identical to the rest of the group.
www.invensislearning.com
83
3.5 Hypothesis Testing with Non-Normal Data
3.5.5 1 Sample Sign Test
The 1 Sample Sign test is the simplest of all the non-parametric tests that can be used instead of a
one sample t test.
• Here, H0 is the hypothecated median or assumed median of the sample, which belongs to the
Population.
Steps involved in 1 Sample Sign test are as follows:
Values that are larger than Values that are smaller than Check if there are significantly
hypothesized median the hypothesized median more positives (or negatives)
than expected
www.invensislearning.com
84
3.5 Hypothesis Testing with Non-Normal Data
3.5.6 1 Sample Wilcoxon Test
The 1 Sample Wilcoxon test also known as the Wilcoxon Signed Rank test is a non-parametric test.
This test is:
• Equivalent to parametric One Sample t-Test, and
• Powerful than non-parametric 1 Sample Sign Test.
www.invensislearning.com
85
3.5 Hypothesis Testing with Non-Normal Data
3.5.6 1 Sample Wilcoxon Test
The conclusion in this test is that if the value is on the mid-point, you can continue
and accept the null hypothesis. If not, reject the alternate hypothesis.
www.invensislearning.com
86
3.5 Hypothesis Testing with Non-Normal Data
3.5.6 1 Sample Wilcoxon Test
www.invensislearning.com
87
3.5 Hypothesis Testing with Non-Normal Data
3.5.7 One and Two Sample Proportion Test
www.invensislearning.com
88
3.5 Hypothesis Testing with Non-Normal Data
3.5.7 One and Two Sample Proportion Test
www.invensislearning.com
89
3.5 Hypothesis Testing with Normal Data
3.5.7 One and Two Sample Proportion Test
www.invensislearning.com
90
3.4 Hypothesis Testing with Normal Data
3.5.7 One and Two Sample Proportion Test
www.invensislearning.com
91
3.5 Hypothesis Testing with Non-Normal Data
3.5.8 Chi-Square Distribution
𝒳 2 f0 −fe 2
Calculated = Σ
fe
Where,
• 𝒳2 = chi-square index
Calculated
• Fo = An observed frequency
• Fe = An expected frequency
www.invensislearning.com
92
3.5 Hypothesis Testing with Non-Normal Data
3.5.8 Chi-Square Distribution
www.invensislearning.com
93
3.5 Hypothesis Testing with Non-Normal Data
3.5.8 Chi-Square Distribution
www.invensislearning.com
94
3.5 Hypothesis Testing with Normal Data
3.5.8 Chi-Square Distribution
H0: Proportion of wins in Australia or abroad is independent of the country played against
Ha: Proportion of wins in Australia or abroad is dependent on the country played against
χ2 Critical = 6.251 and
χ2 Calculated = 1.36
Result: Since calculated value is less than the critical value, the proportion of wins of Australia
hockey team is independent of the country played or place.
www.invensislearning.com
95
3.5 Hypothesis Testing with Non-Normal Data
3.5.8 Chi-Square Distribution
www.invensislearning.com
96
3.5 Hypothesis Testing with Non-Normal Data
3.5.8 Chi-Square Distribution
7. Compute the critical value. Reject H0 if X20 > X2a,k-p-1. The value of p is the number of
parameters estimated.
8. State the conclusion of the test.
www.invensislearning.com
97
3.5 Hypothesis Testing with Non-Normal Data
3.5.8 Chi-Square Distribution
www.invensislearning.com
98
3.5 Hypothesis Testing with Non-Normal Data
3.5.8 Chi-Square Distribution
www.invensislearning.com
99
3.5 Hypothesis Testing with Non-Normal Data
3.5.8 Chi-Square Distribution
www.invensislearning.com