Tips and Tricks For Analyzing Non-Normal Data
Tips and Tricks For Analyzing Non-Normal Data
Tips and Tricks For Analyzing Non-Normal Data
Non-Normal Data
Many statistical analyses are based on an assumed distribution—in other words, they assume that
your data resemble a certain shape. And the most commonly assumed distribution, or shape, is the
normal distribution. However, normally distributed data isn’t always the norm.
A normal distribution has a symmetric bell shape and is centered at the mean.
Some measurements naturally follow a non-normal distribution. For example, non-normal data often results
when measurements cannot go beyond a specific point or boundary. Consider wait times at a doctor’s office
or customer hold times at a call center where it’s not possible to wait a negative amount of time. These
scenarios have a hard boundary at 0, which can skew the data to the right.
This article will cover various methods for detecting non-normal data, and will review valuable tips and tricks
for analyzing non-normal data when you have it.
Normal or Not
Several graphical and statistical tools can be used to
assess whether your data follow a normal distribution,
including:
And for certain analyses, it’s not the actual data Pearson Correlation Spearman Correlation
that should follow a normal distribution, but rather the
residuals.
Note there are also nonparametric approaches to
Residual: the analyses beyond hypothesis tests, such as tolerance
difference between intervals.
an observed Y value
and its correspond-
ing fitted, or 2. Alternative Distributions
predicted, value.
For certain analyses that are highly sensitive to the
normality assumption, such as reliability and survival
probabilities, finding a distribution that adequately
fits the data is critical. And for reliability applications,
Take regression, design of experiments (DOE), and
it’s quite normal to have non-normally distributed
ANOVA, for example. You don’t need to check Y
data.
for normality because any significant X’s will affect
its shape—inherently lending itself to a non-normal For instance, the Weibull distribution is quite
distribution. common when modeling time-to-failure data. This
versatile distribution can be skewed left, skewed
Analyzing Non-Normal Data right, or even approximately symmetric.
When you do have non-normal data and the distri- When analyzing data where the risk of failure does
bution does matter, there are several techniques not depend on the age of the unit, the exponential
available to properly conduct your analysis. distribution may be most suitable. For example,
failure times for many electrical components typically
1. Nonparametrics follow an exponential distribution.
Suppose you want to run a 1-sample t-test to
Other common distributions for reliability data
determine if a population’s average equals a specific
include the lognormal distribution for modeling
target value.
cycles-to-failure in fatigue and load testing, and the
Although t-tests are robust to the normality assump- extreme value distribution for modeling breaking
tion, suppose you have a small sample size and are strength.
concerned about non-normality. Or, suppose you
And if you’re not sure what distribution best fits your
have a sufficient sample size, but you don’t believe
data, then you can use tools like Minitab’s Individual
the average is the best measure of central tendency
Distribution Identification to find out.
for your data.
When your data follow a Weibull, exponential, or
Instead of a parametric test such as the t-test, which
some other non-normal distribution, you don’t have
is based on the mean of the data, you can use a
to be restricted to using the normal distribution to
nonparametric, distribution-free test such as the
run your analysis. Instead, use the distribution that
1-sample sign test to test if the median is on target.
best fits your data.
Nonparametric alternatives are available for the
most commonly used parametric hypothesis tests:
www.minitab.com
Sometimes the transformed data will not follow a
normal distribution, just like the original data. In that
case, consider using an alternative distribution, as
described for reliability analysis.
The End
Non-normal data can occur for many reasons.
Perhaps your data:
• Were sampled from different populations (loca-
tions, genders, seasons, etc.).
The probability plots and p-values reveal that a Weibull • Shift and drift over time.
distribution provides the best fit for this data set.
• Contain extreme outliers.
• Have insufficient resolution (too few significant
3. Transformations digits).
Capability analysis, which is used to determine if And some data sets exhibit none of the above, but
a process falls within specifications, is also highly instead are inherently non-normal, as we previously
sensitive to the distribution assumption. discussed.
Like reliability analysis, you can use a non-normal Whatever the case may be, the first step in analyzing
distribution to calculate process capability, or alter- non-normal data is to understand why it’s non-normal,
natively, you can try to transform your data to follow especially if you have reason to believe it should be
a normal distribution using either the Box-Cox or normal.
Johnson transformation.
For example, if you discover your data are in fact
When you transform your data, you modify the sampled from different populations, consider
original data using a function of a variable. Functions analyzing each population independently. Perhaps
used in the Box-Cox transformation are simple, and each group when treated separately is normally
include taking the square root, inverse, or natural log distributed.
of the original data.
Once you understand the distribution and associated
Functions used in the Johnson transformation are properties of your data, there are many tools—all
more complex than Box-Cox, but they are also more available in Minitab Statistical Software—to help you
powerful. Here’s an example of what a Johnson properly analyze it, gain valuable insight, and find
transformation function looks like: meaningful solutions to your toughest problems.
-3.80937 + 1.67123 * Ln (X + 0.160537)
Although the Box-Cox and Johnson transformations
often successfully transform non-normally distrib-
uted data to normally distributed data, they are not
foolproof.
Minitab® and the Minitab® logo are all registered trademarks of Minitab, Inc., in the United States and
other countries. See minitab.com/legal/trademarks for more information.