Skewness

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

Skewness

Skewness is a measure of the asymmetry of the probability distribution of a real-


valued random variable about its mean. The skewness value can be positive or
negative, or undefined.

In a perfect normal distribution, the tails on either side of the curve are exact mirror
images of each other.

When a distribution is skewed to the left, the tail on the curve’s left-hand side is
longer than the tail on the right-hand side, and the mean is less than the mode. This
situation is also called negative skewness.

When a distribution is skewed to the right, the tail on the curve’s right-hand side is
longer than the tail on the left-hand side, and the mean is greater than the mode. This
situation is also called positive skewness.

Skewness [Image 16] (Image


courtesy: https://www.safaribooksonline.com/library/view/clojure-for-data/9781784397180/
ch01s13.html)

How to the skewness coefficient?

To calculate skewness coefficient of the sample, there are two methods:

1] Pearson First Coefficient of Skewness (Mode skewness)

Image 17
2] Pearson Second Coefficient of Skewness (Median skewness)

Image 18

Interpretations

 The direction of skewness is given by the sign. A zero means no


skewness at all.

 A negative value means the distribution is negatively skewed. A positive


value means the distribution is positively skewed.

 The coefficient compares the sample distribution with a normal


distribution. The larger the value, the larger the distribution differs from a
normal distribution.

Sample problem: Use Pearson’s Coefficient #1 and #2 to find the skewness for data
with the following characteristics:

 Mean = 50.

 Median = 56.

 Mode = 60.

 Standard deviation = 8.5.

Pearson’s First Coefficient of Skewness: -1.17.

Pearson’s Second Coefficient of Skewness: -2.117.

Note: Pearson’s first coefficient of skewness uses the mode. Therefore, if frequency of
values is very low then it will not give a stable measure of central tendency. For
example, the mode in both these sets of data is 9:

1, 2, 3, 4, 4, 5, 6, 7, 8, 9.

In the first set of data, the mode only appears twice. So it is not a good idea to use
Pearson’s First Coefficient of Skewness. But in the second set,
1, 2, 3, 4, 4, 4, 4, 4, 4, 4, 4, 5, 6, 7, 8, 9, 10, 12, 12, 13.

mode 4 appears 8 times. Therefore, Pearson’s Second Coefficient of Skewness will


likely give you a reasonable result.
Kurtosis Kurtosis is a measure of the extremity of deviations (or how far/near lie the
outliers), from the mean and the values obtained on measurement of a distribution's
Kurtosis are related to the tails of the distribution, not its peak. Any univariate normal
distribution will have a certain amount of Kurtosis. Hence, values are generally
expressed in terms of the Excess over this. The excess Kurtosis is defined as
Kurtosis-3; i.e. any value more or less than 3 (which I believe is the value expected
even in a normal distribution) is excess Kurtosis.
I believe the available applications give this Excess value and therefore you do not
have to go and do any subtraction.
Distribution with 0 excess kurtosis (ie. any Kurtosis present is 3 as normally
expected) is called Mesokurtic/mesokurtotic distribution and the example for this is,
as expected normal distribution family.
A distribution with positive excess kurtosis is called leptokurtic, or leptokurtotic. The
0.572 that you have given falls into this; it indicates a curve that is 'slender' in shape,
i.e. it has fatter tails which I believe means that the outliers are close to the mean.
A distribution with negative excess kurtosis is called platykurtic, or platykurtotic. The
curve will appear as broad with thinner tails indicating presence of far away outliers.
Your -0.625 falls into this category.
The above is what I understand and summarised from the Wikipedia article on
Kurtosis (https://en.wikipedia.org/wiki/Kurtosis)
I hope you now understand what Kurtosis is and what the values represent it. But, I
am not a Statistician by profession, needing only to understand it when I used to do a
lot of statistical calculations. I hope some Expert Statistician in RG itself will point
out errors in the above, if any.

The exact interpretation of the measure of Kurtosis used to be disputed but is now
settled. It's about the existence of outliers. Kurtosis is a measure of whether the data
are heavy-tailed (profusion of outliers) or light-tailed (lack of outliers) relative to a
normal distribution.
Kurtosis [Image 19] (Image
courtesy: https://mvpprograms.com/help/mvpstats/distributions/SkewnessKurtosis)

There are three types of Kurtosis

Mesokurtic

Mesokurtic is the distribution that has similar kurtosis as normal distribution


kurtosis, which is zero.

Leptokurtic

Distribution is the distribution that has kurtosis greater than a Mesokurtic


distribution. Tails of such distributions are thick and heavy. If the curve of
distribution is more peaked than the Mesokurtic curve, it is referred to as a
Leptokurtic curve.

Platykurtic

Distribution is the distribution that has kurtosis lesser than a Mesokurtic distribution.
Tails of such distributions thinner. If a curve of a distribution is less peaked than a
Mesokurtic curve, it is referred to as a Platykurtic curve.

The main difference between skewness and kurtosis is that the skewness refers to


the degree of symmetry, whereas the kurtosis refers to the degree of presence of
outliers in the distribution.
Examine the shape of your data to determine whether your data appear to be
skewed
When data are skewed, the majority of the data are located on the high or low
side of the graph. Often, skewness is easiest to detect with a histogram or
boxplot.

Right-skewed

Left-skewed
The histogram with right-skewed data shows wait times. Most of the wait times
are relatively short, and only a few wait times are long. The histogram with left-
skewed data shows failure time data. A few items fail immediately, and many
more items fail later.

he output has two columns. The left column names the statistic and the right column gives
the value of the statistic. For example, the mean of this data is 1.26 (since your data set may
be different, you may get a different value.)
The skewness measure is greater than 0 when the distribution is skewed.
The kurtosis measure is 0 for a normal distribution. Positive values imply a leptokurtic
distribution, while negative values imply a platykurtic distribution.

Standard deviation
Standard deviation is the measurement of the average distance between each quantity
and mean. That is, how data is spread out from the mean. A low standard deviation
indicates that the data points tend to be close to the mean of the data set, while a high
standard deviation indicates that the data points are spread out over a wider range of
values.

There are situations when we have to choose between sample or population Standard
Deviation.
When we are asked to find SD of some part of a population, a segment of population;
then we use sample Standard Deviation.

Image 6

where x̅ is mean of a sample.

You might also like