Section 4 - Analyze Phase

Classes of Distribution
Learning Objectives
By the end of this lesson, you will be able to:
Link the value of a random variable with its probability of

occurrence
Differentiate between discrete and continuous probability

distributions
List the types of discrete and continuous probability

distributions
Frequency Distribution
It is a graphical or a tabular representation that displays the number of observations within a given interval.
The interval size depends on the data being analyzed and the goals of the analyst.
Interval
size
Data Goals
Frequency Distribution
They are generally associated with the charting of a normal distribution.
They show the observations of probabilities divided among standard deviations.
Example: Traders use frequency distributions to check the price actions and identify trends.
Probability Distribution
Frequency distribution Probability distribution
The exact number of The probability of occurrence

times a data point occurs of the given data point
The probability of an event refers to the likelihood that the event will occur.
Probability of event A = P(A)
P(A) = 0 Event A will definitely not occur.
P(A) ≅ 0 There is a small chance that event A will occur.
P(A) = 0.5 There is a 50-50 chance that event A will occur.

0 1
P(A) ≅ 1 There is a strong chance that event A will occur.
P(A) = 1 Event A will definitely occur.

In a statistical experiment, the sum of probabilities for all possible outcomes is equal to one.
An experiment with three possible outcomes: A, B, and C
P(A) + P(B) + P(C) = 1
A probability distribution is a table or an equation that links each possible value of a random
variable with its probability of occurrence.
Types of Probability Distributions
Discrete Continuous
probability probability
distribution distribution
Discrete Probability Distribution
It describes the probability of occurrence of each value of a discrete

random variable.
Heads and heads
Heads and tails
Tails and heads
Tails and tails

The probability distribution of a discrete random variable can always be represented by a table.
Number of tails X Probability P(X)

0 0.25
1 0.50
2 0.25
Continuous Probability Distribution
It describes the probabilities of the possible values of a continuous random variable.
Probability distribution of a continuous random variable is represented by the

probability density function (PDF).
The probability that a continuous random variable equals some value is

always zero.
• Y = F(X)
Probability density • Y ≥ 0 for all values of x
function (PDF)
• The total area under the curve = 1
Continuous probability distribution for the height of trees
It is impossible to figure out the probability of any one tree measuring exactly 70 inches.
It is unlikely that the tree is exactly 70 inches.
69.9 inches
70.1 inches ≈70 inches
Any variable that is on a continuous scale cannot be accurately measured.

Common Distributions
These are common distributions that relate to each other in interesting ways.
Hypergeometric Negative Binomial

Types
Binomial distribution Poisson distribution
Hypergeometric distribution Negative binomial distribution
Multinomial distribution
Discrete Probability Distribution: Types
Binomial distribution A type of distribution that has two possible outcomes
Hypergeometric distribution
Negative binomial distribution
Poisson distribution
Binomial distribution Calculates the probability of successes when sampling

without any replacement
Binomial distribution Finds probabilities in experiments where there are

more than two outcomes
Binomial distribution Pascal Distribution
The number of repeated trials that produce a certain

number of successes
Binomial distribution The number of events occurring in a given time period
Negative binomial distribution • Identifies the probability of zero customers or many

customers coming simultaneously
• Helps a manager plan for these events with appropriate

Poisson distribution staffing and scheduling
Normal or gaussian Standard normal

distribution distribution
F distribution T distribution
Normal distribution Every curve follows the empirical rule
Standard normal distribution
T distribution
F distribution Most outcomes will be within three standard deviations of the mean.
Normal distribution Occurs when a normal random variable has:
Mean Standard Deviation

μ=0 σ=1
Standard normal distribution
T distribution
• Z = The z-score (Standard Score)
• X = The value to be standardized
• μ = mean
F distribution • σ = The standard deviation
Normal distribution Student’s t-distribution

Estimates population parameters when the sample size
is small or when the population variance is unknown
Standard normal distribution • X = The sample mean

• μ = The population mean
• s = The standard deviation
• n = The sample size
T distribution
The t distribution conducts statistical analyses on data sets that

are not appropriate for analysis using the normal distribution.
F distribution
Normal distribution It is used to test whether:
Two independent samples drawn from the

Standard normal distribution normal population have the same variance
or
T distribution
Two independent estimates of the population

variance are homogeneous or not
F distribution
Key Takeaways
Frequency distribution is a graphical or a tabular

representation that displays the number of observations
within a given interval.
A frequency distribution gives the exact number of times a

data point occurs. A probability distribution gives the
probability of occurrence of the given data point.
Discrete probability distribution describes the probability of

occurrence of each value of a discrete random variable.
Continuous probability distribution describes the

probabilities of the possible values of a continuous random
variable.
Key Takeaways
The types of discrete probability distributions include:

Binomial distribution
• Hypergeometric distribution
• Multinomial distribution
• Negative binomial distribution
• Poisson distribution
The continuous probability distribution types are:

• Normal or gaussian distribution
• Standard normal distribution
• T distribution
• F distribution
Inferential Statistics
Learning Objectives
Use inferential statistics to make predictions from the

available data
List the types of statistical inferences
State the importance of the Central Limit Theorem

Types of Statistics
Descriptive statistics describes data in

the form of a chart or a graph.
Inferential statistics allows you to

make predictions or inferences from
the available data.
Inference
The act or process of deriving logical conclusions from premises known
The act of reasoning from factual knowledge or evidence

It is used to draw inferences about the process or population by modeling patterns of data.
The objective is to move from only describing the nature of the data to the ability infer
meaning from data as to what will happen in the future.
Inferential Statistics: Types of Error
There are four types of error contributes to uncertainty when trying to infer with data.
Error in sampling Error in measurement
Bias in sampling
Lack of measurement validity
Inferential Statistics: Types of Error
Error in sampling Error is due to differences among samples drawn at random

from the population
Bias in sampling Error is due to lack of independence among random samples
Error in measurement Error in the measurement of the samples like MSA or GR&R
Lack of measurement validity Error in the measurement does not measure what it is intended
to measure
Inferential Statistics: Example
Assume that you visit a supermarket and select 100 shoppers for a survey.
These shoppers are the representative sample of all customers.
Ask them if they like a particular brand of shampoo and then record their responses.
Inferential Statistics: Example
The recorded data can be used in two ways.
Shampoo
brand
OR
Make a bar chart of yes or no Reason a specific percentage of

answers, which is a descriptive customers prefer this brand of
statistics shampoo over other brands, which is
inferential statistics
In case of inferential statistics, you take data from samples and generalize about a population.
Ensure that the sample accurately reflects the population.
Define the population to be studied
Draw a representative sample
Use analysis methods that incorporate

the sampling error
Inferential Statistics: Branches
Inferential statistics
Parameter estimation Hypothesis testing

Parameter Estimation
A statistic from the sample data is used to infer a

characteristic about a population.
Parameter estimation
Example:
The mean of a sample is used to infer the mean
of a population.
Parameter Estimation
Describe your sample by using descriptive statistics by:
• Calculating the sample mean or the

sample standard deviation
• Making a histogram, pareto chart, bar chart, or

a box plot
• Describing the shape of the sample probability

distribution
Hypothesis Testing
• Uses sample data to answer research questions
Hypothesis testing • Helps you identify the relationship between two

events
• Helps you understand whether a new product

will be liked by customers
Calculate a z-score or conduct a post-hoc test to make inferences about the population.
Inferential statistics use statistical models to compare your sample data with other samples.
Most research uses statistical models such as ANOVA, regression analysis, and various other
models to draw conclusions.
Central Limit Theorem
It states that the sampling distribution of the sample means tend to be a normal
distribution as the sample size gets larger.
The average of the sample means will be similar to the actual population mean.
Central Limit Theorem: Importance
In statistics, the normality assumption is vital for parametric hypothesis

tests of the mean.
The central limit theorem helps produce a distribution that approximates a normal distribution
if the sample size is large enough.
Central Limit Theorem: Importance
The central limit theorem also allows for the precision of the estimates.
• The sampling distributions of the mean cluster

more tightly around the population mean with
an increase in sample size.
• This idea helps to accurately estimate the mean
of an entire population.
• The estimate is more likely to be precise with a
larger sample size.
Key Takeaways
Inferential statistics allow you to make predictions or

inferences from the available data.
Estimating parameters and hypothesis testing are the two

main methods used in inferential statistics.
The Central Limit Theorem states that the sampling

distribution of the sample means tend to be a normal
distribution as the sample size gets larger.
Hypothesis Testing
Learning Objectives
Differentiate between confidence interval and confidence level
Articulate the purpose of hypothesis testing
List the types of decision risks
Choose the right hypothesis test for the discrete data type and
continuous data type
Basics of Hypothesis Testing
Improves process capability by moving the process mean and reducing the standard
deviation
Makes decisions based on the sample data as collecting

population data is expensive
Involves uncertainty about the true population parameters

Hypothesis testing helps to make fact-based decisions:
Are there different

Is the difference only
population
due to expected
parameters?
sample variation?
Hypothesis testing integrates:
The voice of the The voice of the

process business
This helps make data-based decisions to resolve problems.

Hypothesis testing avoids the high costs of experimental efforts by using existing data.
Data analysis can indicate a direction for experimentation, if necessary.

The probability of occurrence is based on a predetermined statistical confidence. Decisions

are based on:
Beliefs or past experiences Preferences or current needs
Risk or acceptable level of Evidence or statistical data

failure
Hypothesis testing helps determine:
• Whether a value is cause for alarm
• Whether two sets of data are different or if a statistical

parameter varies from a test value of interest
• The strength of conclusion
• Uncertainty using some commonly accepted

approaches
Hypothesis Testing: Approaches
Disproving Minimizing the risk

Being objective assumptions of making wrong
decisions
Confidence Interval
Inferential statistics Confidence interval
It uses data from a sample It uses data collected from

to make an inference about a sample to estimate a
a population. population parameter.
Confidence Interval
A confidence interval refers to the probability that a population parameter will fall
between two set values for a certain proportion of times.
Confidence Level
95% or 99%
• It measures the degree of uncertainty

Mean
or certainty in a sampling method.
• It can take any number of

probabilities.
Confidence Interval
Confidence interval Confidence level
The percentage of
A range of values that likely
probability that the
would contain an unknown
confidence interval would
population parameter
contain the true population
parameter
“ We are 99% certain (confidence level) that most of these datasets
”
(confidence intervals) contain the true population parameter.
Confidence Interval
Confidence intervals help determine the likely range of the population parameter.
Example
95% confidence interval is 5 +/-2
95% confidence that the mean of the population is between 3 and 7

Confidence Interval
Confidence intervals have a level of uncertainty:
Estimate ± Margin of Error

Importance of Confidence Interval
Sample statistics are only estimates of the population’s parameters.
Sample Statistic • Confidence intervals quantify

uncertainty as there is variability in
these estimates from sample to sample.
• They provide a range of plausible values

for the population parameters.
x, s Measure of variability
• Any sample statistic will vary from one
sample to another and from the true
Confidence factor population or process parameter value.
Significant Difference
µ1 µ2
Sample 1 Sample 2
• Are the two distributions significantly different from each other?

• How do the number of observations affect your confidence in detecting population mean?
Significant Difference
It is difficult to determine a statistical difference.
µ1 µ2
Sample 1 Sample 2
• The confidence established statistically has an effect on the necessary sample size.
• The ability to detect a difference is directly linked to sample size.
Significant Difference: Mean
Consider two different sample size for mean and test both them to check if they are similar or
not and see how this would impact the assumption about mean.
One item 900 items
• How close would you expect to get to the

• How close do you expect to get to the true
true population mean?
population mean?
• How well do you think this one item represents
the true mean?
• How much ability do you have to draw
conclusions about the mean?
Significant Difference: Standard Deviation
Population with a lot of variation Population with less variation
X Bar X Bar
Statistical Inferences and Confidence
Population
Population
X-Bar
• How close do you think the true mean (m) is to your estimate mean, X-Bar?
• How certain do you need to be about conclusions we make from your estimates?
Statistical Inferences and Confidence
As you tighten your estimate of the mean, the risk of being wrong increases.
Population
X-Bar
To be more confident in your conclusions, relax the range in which the true mean lies.
Detecting Significance
Statistics provide a methodology to detect significant differences.
The two types of significant differences, practical and statistical, must be well
understood.
Failure to tie these two differences together is one of the most common
errors in statistics.
Examples: Differences in suppliers, shifts, or equipment

Practical vs. Statistical Difference
Practical difference Statistical difference
• A difference which results in an • A difference or change to the process

improvement of practical or economic that probably did not happen by
value to the company chance
• Example: An improvement in yield • Example: Differences in suppliers,

from 96% to 99% markets, or servers
Variation Reduction
Mean Shift
The process of detecting a significant change
How much of a shift in the mean will offset the cost of making a change to the process?
Variation Reduction
Mean Shift
How small or how large a delta is required?
• The larger the delta, the smaller the necessary sample will be.
• The smaller the delta, the larger the sample size.
Hypothesis Testing
A hypothesis test is an a priori theory relating to the differences between variables.
Practical problem Statistical problem
There is always a chance of collecting a nonrepresentative sample.
Inferential statistics allows us to estimate the probability of getting a nonrepresentative sample.

Dice Example
A single die is altered in some form to make a certain number appear more
often that it rightfully should.
• Throw a die a number of times and track how

many times each face occurred.
• With a standard die, each face is expected to

occur 16.67% of the time.
If we threw the die five times and got five one’s, what would you conclude?
Dice Example
• P (One 1) = 0.1667
• P (Five 1’s) = (0.1667)5 = 0.00013
We can take a 0.1% chance of being wrong about hypothesis that the die was loaded.
Statistical Hypothesis Test
A hypothesis is a predetermined theory about the relationships between variables.
Statistical tests can prove that a relationship exists with a certain degree of confidence.
Types of Hypothesis
Null hypothesis Alternative hypothesis
• Represented by Ho • Represented by Ha
• There is no difference or • There is a difference or
relationship relationship
• P-value is greater than 0.05 • P-value is lesser than 0.05
With an assumption that the null If the null hypothesis is rejected,
hypothesis is true, you can only you have data that supports the
reject or fail to reject the null alternative hypothesis.
hypothesis.
Steps for Hypothesis Test
State the practical problem 1
State the statistical problem

2 a. HO: ___ = ___
b. HA: ___ ≠ ,>,< ___
Select the appropriate
statistical test and risk levels 3
a. α = .05
b. β = .10
Establish the sample size required
4
to detect the difference
State the statistical solution 5
6 State the practical solution

Steps for Hypothesis Test
Alpha may change depending on the problem.
• An alpha of 0.05 is common in most

manufacturing projects.
• In transactional projects, an alpha of 0.10 is
common when dealing with human behavior.
• An alpha of 0.01 is only used when the null
hypothesis should not easily be rejected.
P-Values
Any differences between observed data and claims made under the null
hypothesis may be real or due to chance.
Probability
P-value
Observed data
Hypothesis tests determine the probabilities of these differences occurring solely due to
chance and call them P-values.
P-Values
The alpha level of a test (level of significance) represents the yardstick against
which P-values are measured.
H0 is rejected if the P-value is less than the alpha level.
Commonly used levels
1% 5% 10%
Decision Risks
Type I error
Decisions
Type II error β n Sample size

Alpha Risk
• It is called the producer’s risk.
• It is the probability that we could be wrong in saying

that something is different.
Type I error
• It is an assessment of the likelihood that the
observed difference could have occurred by random
α chance.
• It is the primary decision-making tool for most

statistical tests.
• It is the risk of implementing a change when you

should not.
Alpha Risk
Alpha risk is typically lower than beta risk.
X
More hesitant to make a Overlooking an X
mistake about claiming which is never
the significance revealed
Alpha Risk: Formula
α = The probability of making a type 1 error
α = The probability of rejecting the null hypothesis when it is true

Alpha Risk
Alpha risks are expressed relative to a reference distribution. Distributions include:
T-distribution Z-distribution
χ2-distribution F-distribution
Alpha Risk
The alpha level is represented by the clouded areas.

Sample results in this area lead to rejection of null hypothesis.
Region of Region of
DOUBT DOUBT
Accept as chance differences

Beta Risk
• It is called the consumer’s risk.

Type II error
• It is the probability that we could be wrong in saying
that two or more things are the same when they are
β different.
• It is the failure to recognize an improvement.

Beta Risk: Formula
β = The probability of making a type II error
β = The probability of failing to reject the null hypothesis when it is false

Beta Risk
Beta and sample size are very closely related. When calculating sample size, enter the
power of the test which is one minus beta.
Power = 1 - β
This establishes a sample size that will allow the proper overlap of distributions.
Beta Risk
If two populations' means differ by a very small amount, then we are likely to conclude
that the two populations are the same.
Beta only comes into play if the null hypothesis truly is false.
The more false it is, the greater your chances of detecting it, and the
lower the beta risk.
The power of a hypothesis test is its ability to detect an effect of a given magnitude.
Avoid Common Pitfalls
Actual Conditions
• The decision is about Ho and not Ha.
Not Different Different
(Ho is True) (Ho is False) • Check whether the contention of Ha was upheld.
• Ho is on trial.
Not Different Correct Type II
Conclusions
• When a decision is made, remember that:

Statistical
( Fail to Reject Ho) Decision error

o Nothing has been proved
Correct o It is just a decision

Different Type I
(Reject Ho) error Decision o All decisions can lead to type I and type II errors
Avoid Common Pitfalls
Actual Conditions
• If the decision is to Reject Ho, then there is sufficient
Not Different Different
(Ho is True) (Ho is False) evidence at the α level of significance to prove the
alternative hypothesis.
Not Different Correct Type II
Conclusions
Statistical
( Fail to Reject Ho) Decision error

• If the decision is to Fail to Reject Ho, then there
Correct isn’t sufficient evidence at the α level of significance to

Different Type I
(Reject Ho) error Decision prove the alternative hypothesis.
Power of a Hypothesis Test
It shows the probability of the hypothesis detecting a significant

difference or effect when it really exists.
The power of a hypothesis test is the probability of not making a type II error.
Power of a Hypothesis Test
To differentiate between the light emission from two bulbs:
• How many samples do we need if the average light

emission from one bulb differs from another
supplier?
• How many times should the experiment be

replicated to get at least an 85% chance of detecting
the factors responsible for affecting the
manufacturing process?
Use power of a test tool from MINITAB to get the answers.

Factors that Affect Power
Effect size Significance level (α) Sample size (n)
The lower the

The greater the effect The greater the sample
significance level, the
size, the greater the size, the greater the
lower the power of the
power of the test. power of the test.
test.
Effect Size
It is the difference between the value specified in the null hypothesis and the true
value for a population parameter.
Effect size = True parameter value - Hypothesized parameter value
Population mean of True population mean of

History test scores History test scores
120 100
Effect size = 100 - 120

Effect size = -20
Significance Level (α)
Reducing the significance level makes the region of acceptance bigger.
The null hypothesis is not rejected when it is actually false, which leads to a type II error.
α α
Significance level
Sample Size
As the number of individual observations increases, the standard error decreases.
Theoretical Distribution of Means

When n = 2
δ=5
S=1
Theoretical Distribution of Means

When n = 30
δ=5
S=1
Delta and Sigma
Large Delta 𝛅
The size of the difference between two

means or one mean and a target value
The sample standard deviation of the

distribution of individuals of one or both the
samples under question
Large σ
The Ratio between 𝛅 and S
Large Delta 𝛅
• When δ and σ are large, the differences are

large.
• If the variances of the data are large, it is
difficult to establish differences.
• It is important to have larger sample sizes to
reduce uncertainty.
Large σ
The Perfect Sample Size
We want to be 95% confident in our estimates.
Perfect Sample Size
• The minimum sample size required to

provide exactly 5% overlap in order to
distinguish the delta
• If you are working with non-normal data,

multiply your calculated sample size by 1.1
Typical Questions on Sampling
One could say a sample of 30 is perfect, but that may be too many.
The right sample is not known without the test.
Q How many samples should we take?
A Well, that depends on the size of your delta and standard deviation.
Q How should we conduct the sampling?
A Well, that depends on what you want to know.

Q Was the sample we took large enough?
A Well, that depends on the size of your delta and standard deviation.
Q Should we take some more samples just to be sure?
A No, not if you took the correct number of samples the first time.
Hypothesis Testing Roadmap: Decision Tree
Decision Tree
Continuous Y, Continuous X Discrete Y, Continuous X
• Correlation test Chi-square test

• Regression test • Chi-square goodness of fit
• Chi-square cross tabulation
• Chi-square two-way table
Continuous Y, Discrete X Discrete Y, Discrete X

Test of means Variation in the problem
• Binary logistic regression
Normal Non-normal Normal data Non-normal • Nominal logistic regression
data data data • Ordinal logistic regression
Decision Tree
Continuous Y, Discrete X
Test of means Variation in the problem
Normal Data Non-Normal Data Normal Data Non-Normal Data
• 1-sample t-test • Mood’s median test • Bartlett test • Levene test

• 2-sample t-test • Mann-Whitney test • F-test
• Paired t-test • I-sample sign test
• ANOVA • Kruskal-Wallis
• Friedman
• 1-sample Wilcoxon
• 1- and 2-sample proportion
Key Takeaways
A confidence interval is a range of values that likely would

contain an unknown population parameter.
Hypothesis testing integrates the voice of the process with

the voice of the business to make data-based decisions to
resolve problems.
The alpha risk (type I error) is the probability that we could

be wrong in saying that something is different.
The beta risk (type II error) is the probability that we could be

wrong in saying that two or more things are the same when
they are different.
Hypothesis Testing with Normal Data
Learning Objectives
Differentiate between one-sample t-test and two-sample t-test
Perform a test of equal variance or Bartlett’s test in Minitab
Compare the means of two measurements using a paired t-test
Analyze the relationship between variables using ANOVA

Hypothesis Testing Roadmap for Normal Data
Y Continuous, X Discrete
Tests of means and variances Variation in the problem
Normal data Non-normal Data Normal data Non-normal data
• 1-sample t-test • Mood's Median test • Bartlett test • Levene test

• 2-sample t-test • Mann-Whitney test • F-test
• Paired t-test • 1-sample sign test
• ANOVA • Kruskal-Wallis
• Friedman test
• 1-sample Wilcoxon
test
• One- and two- sample
proportion test
Tests of Means (T-Tests)
T-tests are used to:
Compare the mean of a sample against a given target
Compare means from two different samples
Compare paired data

A t-test is a type of inferential statistic test.
● It is used to find out if there is a significant

difference between the means of two groups
● It considers:
○ T-statistic
○ T-distribution values
○ Degrees of freedom
For example, the sales team has improved. They want to compare the new mean against
a given target to see if they met the target.
Means from two different samples can be compared using:
Effectiveness of team A Effectiveness of team B

before and after training before and after training
Compare
Quality of product A Quality of product B
Compare
Paired data can be compared using the effectiveness of a team before and
after the training.
Analysis of variance or ANOVA is used when it is necessary to compare more than two means.
Practical Analysis
External Internal
80.3 85.8
84.1 83.2
81.5 84.4
85.5 83.4
This is the observed (collected) 83.7 86.0
sample data set. 85.2 80.6
81.3 83.1
88.2 88.0
79.6 86.9
84.7 84.3
F-Test
In an f-test, the test statistic has an f-distribution under the null hypothesis.
It is used to:
Compare statistical models that have been fitted to a

specific data set
Identify the model that best fits the population from

which the data were sampled
Example: The hypothesis that a proposed regression model fits the data well
One-Sample T-Test
Expected population mean

Mean of a sample
to a target
Compare
Minitab performs a one-sample

t-test or t-confidence interval
for the mean.
One-Sample T-Test
The one-sample t-test is a parametric test.
• Compares the sample mean to the target which is the

known or hypothesized population mean
• Checks if the sample mean is statistically different from a

known or hypothesized population mean
• Compares the test variable against a test value
Target μsample • Uses test variable

One-Sample T-Test
The one-sample t-test is also known as a single-sample t-test.
population
distribution
39 35
sample 36 41
data 41 40
40
44
This test value is a known or hypothesized value of the mean in the population.
One-Sample T-Test
Use a one-sample t-test to perform a hypothesis test of the mean when the
population standard deviation (σ) is unknown.
• Look for the region in which we can be 95% sure our true population mean will lie
• Use calculated average, standard deviation, number of trials and a given alpha risk of .05
The target must fall within the confidence interval of the sample mean.
One-Sample T-Test
For a one- or two-tailed one-sample test:
• H0: μsample = μtarget If p-value > 0.05, accept Ho

• Ha: μsample ≠, <, > μtarget If p-value < 0.05, reject Ho
P-value: The probability of obtaining results while assuming that the null
hypothesis is correct
A smaller p-value indicates that the alternative hypothesis is favored.

One-Sample T-Test: Sample Size
Target
• A pitfall in statistics is not understanding the proper

sample size
• Process mean and the desired target are not the same,
but they may be within an acceptable tolerance
S
SE Mean =
√n
Target
When sample size is two, one cannot
tell the difference between the sample
and the target
Sample size 2
When sample size is thirty, one can tell

the difference between the sample Sample size 30
and the target
S
SE Mean =
√n
Observation on sample size of 2
• The spread of the distribution of averages from samples of 2 will create uncertainty
• 95% of the area under the curve of a normal distribution falls within +/-2 standard
deviations
• Confidence intervals are based on your selected alpha level:
o If you selected an alpha of 5%, then the confidence interval would be 95%
• The target value falls within +/-2 standard deviations of the sampling distribution
Observation on sample size of 30
• The target appears outside the 95% confidence interval of the mean
Target
• The standard error of a statistic is the standard

deviation of its sampling distribution:
o If the statistic is the sample mean, it is called the

standard error of the mean (SEM)
S
SE Mean =
√n
Two-Sample T-Test
• A two-sample t-test is used to compare two means
• It is used to compute a confidence interval of the

difference between two population means
• It is a frequently used hypothesis test
• It is generally applied to compare the average

μ1 μ2 difference between two groups is really significant
Two-Sample T-Test
The difference in the hypothesis for the two-tailed test vs. the one-tailed test:
For a two-tailed test:
• H0: μ1 = μ2 If p-value > 0.05, fail to reject Ho
• Ha: μ1 ≠ μ2 If p-value < 0.05, reject Ho
For a one-tailed test:
• H0: μ1 = μ2 If p-value > 0.05, fail to reject Ho

Two-Sample T-Test: Example
Step 1
Practical problem: Conduct a study in order to determine the

effectiveness of a ceiling fan (fan 1= 1 and fan 2= 2)
• Install two types of fans

• Compare the RPM of the fans
• Determine the difference between the two products
Step 2 Step 3
Statistical problem: Use the two-sample t-test as the

population standard deviations are
● Ho:μ1 = μ2 unknown
● Ha:μ1 ≠ μ2 ● Ho:μ1 = μ2
● Ha:μ1 ≠ μ2
● Alpha levels and beta levels are related to each other.

○ Alpha: Probability that you will make the mistake of rejecting the null
hypothesis when it is true
● P-value: Measures the probability of getting a more extreme value than

the one you got from the experiment
○ If the p-value is greater than alpha, the null hypothesis is accepted
To obtain alpha (α), subtract the confidence level from 1.

Example:
● For a one-tailed test: If you want to be 95% confident that the analysis
is correct, the alpha level will be 1 – .95 = 5%
● For a two-tailed test: Divide the alpha level by 2, which in this case will
be 2.5%
Type 1 error An alpha level is the probability of rejecting the

null hypothesis when it is true
A significance level of 0.05

Indicates
5% risk of deciding that a
difference exists when there
is no actual difference
A beta level or beta (β) is the opposite of the

Type 2 error alpha level; it’s the probability of accepting the
null hypothesis when it's false.
If alpha increases, power

increases and beta
decreases in value.
Alpha Power Beta
If alpha decreases, power

decreases and beta
increases in value.
● The power of a test is a measure of quality for a hypothesis test

● The formula for power: (1 – beta). It is between 0 and 1
If the power is close to 1, the hypothesis test is very good at detecting a false null hypothesis.
Beta is generally set at 0.2 but may also be set to smaller values.
Step 4
Open the worksheet in Minitab and

code data:
● Work with the data in the BTU
● Unstack the data by damper type
Check if the data is normal:
Normality Test
Probability Plot
Paired T-Test
delta
(δ) • A paired t-test is performed to compare the means of two
measurements from the same or identical samples
• It is also used before and after a solution on a given process
• It can be performed in Minitab
• It is appropriate for testing when data are paired and the

μbefore μafter paired differences follow a normal distribution
Paired T-Test
Use the paired t command to:
Perform a
Compute a hypothesis test of
confidence the difference
interval between
population means
Paired T-Test
A paired t-procedure matches responses that are dependent on each other.
It allows you to account for variability between the pairs usually resulting in a smaller error term.
This increases the sensitivity of the hypothesis test or confidence interval.
μδ is the population mean of the differences, and μ0 is the hypothesized mean

of the differences, typically zero.
Purpose of ANOVA
Analysis of Variance or ANOVA is the method for:
Analyzing and
modeling the
relationship
Response variable (Y) Independent variables (Xs)
ANOVA
Extends a two-sample t-test
Two-sample t-test
Tests for equality of two population means
Purpose of ANOVA
The independent variable or factor usually has three or more levels.
It helps to find significant differences among means using multiple comparisons.

What Do We Want to Know?
Is the variation between the group large enough to be distinguished from

the variation within the group?
delta
(Between Group Variation)
(δ)
Total (Overall)
Variation
Within Group Variation

(level of supplier 1)
μ1 μ2
Calculating ANOVA
Total(Overall) Variation
delta
(δ)
Within Group Variation
(Between Group Variation)

Calculating ANOVA
𝑔 𝑔 𝑛𝑗 𝑔 𝑛𝑗
2 2 2
෍ 𝑛𝑗 𝑋𝑗 − 𝑋ധ ෍ ෍ 𝑋𝑖𝑗 − 𝑋ത ෍ ෍ 𝑋𝑖𝑗 − 𝑋ധ
𝑗=1 𝑗=1 𝑖=1 𝑗=1 𝑖=1
Between Group Variation Within Group Variation Total Variation
g = the number of groups (levels in the study)

Xij = the individual in the jth group
nj = the number of individuals in the jth group or level
X = the grand mean
ന
Xj = the mean of the jth group or level
Alpha Risk and Pairwise T-Tests
A t-test cannot be used to evaluate a series of means, because the alpha risk
increases as the number of means increases.
Formula: 1 – (1 – α)k
Here, k = number of pairs of means
So, for 7 pairs of means and an α of 0.05:
1 – (1 – 0.05)7 = 0.30
This means a 30% alpha risk

Sample Differentiation: Example
A new start-up has recently purchased an office and wants to get the carpeting done for all the
floors. Three carpet suppliers claim that their carpets are equal in levels of quality and durability.
Supplier A Supplier B Supplier C

3.1 4.21 4.65
4.4 3.9 3.98
3.49 3.85 4.22
3.8 4.15 3.86
3.55 3.77 3.4
Test the data to determine whether there is a difference between the three suppliers.
Test for Normality
Compare the P-values. All three suppliers’ samples are Normally Distributed.
Supplier A P-value 0.585 Supplier B P-value 0.333 Supplier C P-value 0.883

Histogram of Residuals
(responses are supplier A, supplier B, and supplier C)
The histogram of residuals should show a bell-shaped curve.

Normal Probability Plot of Residuals

Residuals vs. Fitted Values
Residuals vs. the Fitted Values

There should be no outliers present in this plot.

Paired vs. Independent Samples
Paired versus independent samples may be difficult to judge.
Paired samples: Occur Tire wear of two brands on the same

on the same subject car at the same time
Example
Independent samples:
Wear of two tire brands, one brand
Require independent
on car A and one on car B
random samples Example
Equal vs. Unequal Variances
If equality is assumed erroneously, you can be seriously misled in estimating and

testing the difference in means.
If inequality is assumed, when the variances are equal, we get a slightly conservative
approach where precision in estimation is lost.
Random Sampling
T-tests gather data by randomly sampling from a normal distribution.
If the distribution T distribution gives good approximations on

is not normal random samples
Obtaining a nonrandom sample is of greater concern than data that is

not normally distributed.
Key Takeaways
T-tests are used to compare the mean of a sample against a

given target, means from two different samples, and paired
data.
The paired test can be used to analyze the difference

between the means obtained from two related samples.
ANOVA is used to investigate and model the relationship

between a response variable and one or more independent
variables.
F-test is a statistical test where the test statistic has an f-

distribution under the null hypothesis.
Key Takeaways
One-sample t-test is used to analyze the difference between

a mean obtained from a single sample and a target value or
historical mean.
The purpose of the two-sample t-test is to analyze the

difference between the means obtained from two
independent samples:
• The two samples are independent
• Variances may be equal or unequal
• Data may be in a single column with a grouping variable or
in two different columns
Test for equal variances or Bartlett’s test is used to analyze

the difference in variances of two or more independent
samples.
Hypothesis Testing with Non-Normal Data
Learning Objectives
Classify different types of nonparametric tests
Identify the nonparametric tests for central tendency of data

using median
Relate the uses of different chi-square tests

Nonparametric Tests
Nonparametric tests do not make assumptions about normality.
A nonparametric hypothesis testing works the same way as parametric testing.
The selected population is representative

The population sample is randomly drawn
of the general population
The data is in an interval or a ratio scale

Nonparametric Tests: Assumptions
Data is measured on
any scale
Nonparametric tests
can be applied when:
Data does not follow

any specific
distribution and no
assumptions about
the population are
made
Nonparametric Tests: Types
Mann-Whitney Kruskal-Wallis Mood’s Friedman Test

Test Test Median Test
One- and
One-Sample One-Sample Two-Sample Chi-Square
Sign Test Wilcoxon Test Proportion Test
Tests
Mann-Whitney Test
It compares the differences between two independent groups when dependent

variables are ordinal.
Identical distributions Same shape but different

location
Males Females
Males Females
Distributions
overlay
Count perfectly Count
Engagement score Engagement score
The null hypothesis for this test is that the two populations are equal.
Kruskal-Wallis Test
It is a nonparametric test used to compare two or more independent samples of equal

or different sizes.
The null hypothesis is that population medians are equal.

Mood’s Median Test
It compares the medians of two samples. It is best suited for small sample sizes.
Gr 1
Gr 2
Min Overall Median Max
The null hypothesis for this test is that the medians for both samples are the same.
Friedman Test
It is used to test the differences between groups with ordinal dependent variables.
Individual Value Plot of Response vs.

20 Ad Type
15
It is best suited for instances The null hypothesis is that there
Response
where the same parameter is 10
are no differences between the
measured under different variables measured under
5
conditions on the same subject. different conditions.
0
Direct Mail Magazine Newspaper
Ads
One-Sample Sign Test
It is a nonparametric hypothesis test that determines if there is any difference between the
median of a non-normally distributed data set and a reference value.
The median of a non-

normally distributed and A reference value
data set
The data should be taken from two samples that are paired.
One-Sample Sign Test
The null hypothesis for this test is that the median of a distribution is equal to a
hypothesized value.
Median of a distribution
= A hypothesized value
One-Sample Wilcoxon Test
It is similar to a one-sample sign test.
It is used to compare the median of the population with the hypothesized median.
The null hypothesis states that the population median is equal to the hypothesized median.
One-Sample Proportion Test
• It is used to estimate the proportion of a

population.
• It compares the proportion of a

population to a reference value.
• It is most suited when the population

follows a binomial distribution.
The null hypothesis states that the population proportion is equal to the
hypothesized proportion.
Two-Sample Proportion Test
The null hypothesis

of this test states
that the population
proportions of the
two samples are
equal.
Two-Sample
Proportion Test
It is used to
determine whether
the proportions of
two separate
samples differ.
Chi-Square Test
It can be used to compare a parameter of a population with a hypothesized parameter when

the data is not normal.
Test of Independence Cross-Tabulation Test
Goodness of Fit
Chi-Square Test of Independence
It is used to check if there is any relationship between two nominal

variables from a population.
Test of independence Goodness of Fit Test
Compares two variables Compares only one variable

Chi-Square Goodness of Fit Test
It is a nonparametric test used to compare the differences in an observed

value of a given attribute from the expected value.
It can be used when you want to check how well a sample distribution of one
variable fits the population distribution.
It can be used when you have a variable from the population and would like
to compare the sample variable with the population variable.
Chi-Square Cross-Tabulation Test
It is an arrangement in a matrix format that summarizes the relationship between

two categorical variables.
Technical Writing Expertise

Major
None Beginner Intermediate Expert
Stem 8 7 11 39
Soc. Sci. 11 41 13 9
Humanities 9 11 25 19
Interdisciplinary 15 9 21 14
It is also known as a contingency table.

Chi-Square Cross-Tabulation Test
Technical Writing Expertise

Major
None Beginner Intermediate Expert
Stem 8 7 11 39
Soc. Sci. 11 41 13 9
Humanities 9 11 25 19
Interdisciplinary 15 9 21 14
In a chi-square cross-tabulation test:
• Data values of two or more categorical values are displayed in a contingency table.
• Decisions are made based on the table values.

Key Takeaways
Nonparametric tests can be used to form a hypothesis

where the data is not normally distributed.
Three types of chi-square tests are:

1. Test of independence
2. Goodness of fit
3. Cross-tabulation tests
Key Takeaways
The different nonparametric tests include:

• Mann-Whitney test
• Kruskal-Wallis test
• Mood’s Median test
• Friedman test
• One-sample sign test
• One-sample Wilcoxon test
• One- and two-sample proportion tests
• Chi-square tests

Section 4 - Analyze Phase

Uploaded by

Copyright:

Available Formats

Section 4 - Analyze Phase

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Section 4 - Analyze Phase

Uploaded by

Copyright:

Available Formats

Classes of Distribution

By the end of this lesson, you will be able to:

Link the value of a random variable with its probability of

Differentiate between discrete and continuous probability

List the types of discrete and continuous probability

They are generally associated with the charting of a normal distribution.

They show the observations of probabilities divided among standard deviations.

Frequency distribution Probability distribution

The exact number of The probability of occurrence

Probability of event A = P(A)

P(A) = 0 Event A will definitely not occur.

P(A) ≅ 0 There is a small chance that event A will occur.

P(A) = 0.5 There is a 50-50 chance that event A will occur.

P(A) = 1 Event A will definitely occur.

An experiment with three possible outcomes: A, B, and C

P(A) + P(B) + P(C) = 1

It describes the probability of occurrence of each value of a discrete

Heads and heads

Heads and tails

Tails and heads

Tails and tails

Number of tails X Probability P(X)

It describes the probabilities of the possible values of a continuous random variable.

Probability distribution of a continuous random variable is represented by the

The probability that a continuous random variable equals some value is

Continuous probability distribution for the height of trees

It is unlikely that the tree is exactly 70 inches.

70.1 inches ≈70 inches

Any variable that is on a continuous scale cannot be accurately measured.

Hypergeometric Negative Binomial

Binomial distribution Poisson distribution

Hypergeometric distribution Negative binomial distribution

Binomial distribution A type of distribution that has two possible outcomes

Negative binomial distribution

Binomial distribution Calculates the probability of successes when sampling

Negative binomial distribution

Binomial distribution Finds probabilities in experiments where there are

Negative binomial distribution

Binomial distribution Pascal Distribution

The number of repeated trials that produce a certain

Negative binomial distribution

Binomial distribution The number of events occurring in a given time period

Negative binomial distribution • Identifies the probability of zero customers or many

• Helps a manager plan for these events with appropriate

Normal or gaussian Standard normal

Normal distribution Every curve follows the empirical rule

Standard normal distribution

Normal distribution Occurs when a normal random variable has:

Mean Standard Deviation

Normal distribution Student’s t-distribution

Standard normal distribution • X = The sample mean

The t distribution conducts statistical analyses on data sets that

Normal distribution It is used to test whether:

Two independent samples drawn from the

Two independent estimates of the population

Frequency distribution is a graphical or a tabular

A frequency distribution gives the exact number of times a

Discrete probability distribution describes the probability of

Continuous probability distribution describes the

The types of discrete probability distributions include: