Skewness and Kurtosis

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 23

Skewness and Kurtosis

Skewness & Kurtosis


Simplified
What is Skewness and how do we detect it?

Atul Sharma

Nov 9, 2020·4 min read

If you will ask Mother Nature — What is her favorite probability


distribution?

The answer will be — ‘Normal’ and the reason behind it is the


existence of chance/random causes that influence every known
variable on earth. What if a process is under the influence of
assignable/significant causes as well? This is surely going to
modify the shape of the distribution (distort) and that’s when we
need a measure like skewness to capture it. Below is a normal
distribution visual, also known as a bell curve. It is a symmetrical
graph with all measures of central tendency in the middle.
(Image by author)

But what if we encounter an asymmetrical distribution, how do we


detect the extent of asymmetry? Let’s see visually what happens to
the measures of central tendency when we encounter such graphs.

(Image by author)

Notice how these central tendency measures tend to spread when


the normal distribution is distorted. For the nomenclature just
follow the direction of the tail — For the left graph since the tail is
to the left, it is left-skewed (negatively skewed) and the right graph
has the tail to the right, so it is right-skewed (positively skewed).

How about deriving a measure that captures the horizontal


distance between the Mode and the Mean of the distribution? It’s
intuitive to think that the higher the skewness, the more apart
these measures will be. So let’s jump to the formula for skewness
now:

Division by Standard Deviation enables the relative comparison


among distributions on the same standard scale. Since mode
calculation as a central tendency for small data sets is not
recommended, so to arrive at a more robust formula for skewness
we will replace mode with the derived calculation from the median
and the mean.

*approximately for skewed distributions


Replacing the value of mode in the formula of skewness, we get:

(Image by author)

What is Kurtosis and how do we capture it?

Think of punching or pulling the normal distribution curve from


the top, what impact will it have on the shape of the distribution?
Let’s visualize:
(Image by author)

So there are two things to notice — The peak of the curve and the
tails of the curve, Kurtosis measure is responsible for capturing
this phenomenon. The formula for kurtosis calculation is complex
(4th moment in the moment-based calculation) so we will stick to
the concept and its visual clarity. A normal distribution has a
kurtosis of 3 and is called mesokurtic. Distributions greater than 3
are called leptokurtic and less than 3 are called platykurtic. So the
greater the value more the peakedness. Kurtosis ranges from 1 to
infinity. As the kurtosis measure for a normal distribution is 3, we
can calculate excess kurtosis by keeping reference zero for normal
distribution. Now excess kurtosis will vary from -2 to infinity.

Excess Kurtosis for Normal Distribution = 3–3 = 0

The lowest value of Excess Kurtosis is when Kurtosis is 1 = 1–3 =


-2

(Image by author)

The topic of Kurtosis has been controversial for decades now, the
basis of kurtosis all these years has been linked with the
peakedness but the ultimate verdict is that outliers (fatter tails)
govern the kurtosis effect far more than the values near the mean
(peak).

So we can conclude from the above discussions that the horizontal


push or pull distortion of a normal distribution curve gets
captured by the Skewness measure and the vertical push or pull
distortion gets captured by the Kurtosis measure. Also, it is the
impact of outliers that dominate the kurtosis effect which has its
roots of proof sitting in the fourth-order moment-based formula. I
hope this blog helped you clarify the idea of Skewness & Kurtosis
in a simplified manner, watch out for more similar blogs in the
future.

Thanks!!!

https://www.slideshare.net/nemalynyap/pearsons-coefficient-of-skewness
https://www.slideshare.net/rajkumarteotia/skewness-40437601
https://www.slideshare.net/THIYAGUSURI/npc-skewness-and-kurtosis
https://www.slideshare.net/dhanasekaran10/skewness-7301997

https://www.mymathtables.com/calculator/stats/sample
-skewness-calculator.html
https://www.wallstreetmojo.com/skewness/
Statistics for Data Science: What is Skewness and Why is it Important?

Published
 1 year ago 
on
 July 5, 2020
By
 Republished by Plato

Overview
 Skewness is a key statistics concept you must know in the data science and
analytics fields
 Learn what is skewness, the formula for skewness, and why it’s important
for you as a data science professional
Introduction
The concept of skewness is baked into our way of thinking. When we look at a
visualization, our minds intuitively discern the pattern in that chart.
Think about it – you look at a chart of a cricket team’s batting performance in a
50-over game and you’ll quickly notice how there’s a sudden deluge of runs in the
last 10 overs. Now think of that in terms of a bar chart – there’s a skew towards
the end, right?
So even if you haven’t read up on skewness as a data science or analytics
professional, you have definitely interacted with the concept on an informal note.
And it’s actually a pretty easy topic in statistics – and yet a lot of folks skim
through it in their haste of learning other seemingly complex data science
concepts. To me, that’s a mistake.
https://itfeature.com/statistics/skewness-measure-of-asymmetry
https://itfeature.com/statistics/skewness-introduction-formula-interpretation\

Positive (Right) Skewness Example


A scientist has 1,000 people complete some psychological tests. For test 5, the test
scores have skewness = 2.0. A histogram of these scores is shown below.
The histogram shows a very asymmetrical frequency distribution. Most people score 20
points or lower but the right tail stretches out to 90 or so. This distribution is right
skewed.
If we move to the right along the x-axis, we go from 0 to 20 to 40 points and so on. So
towards the right of the graph, the scores become more positive.
Therefore,right skewness is positive skewnesswhich means skewness > 0. This first
example has skewness = 2.0 as indicated in the right top corner of the graph. The
scores are strongly positively skewed.

Negative (Left) Skewness Example


Another variable -the scores on test 2- turn out to have skewness = -1.0. Their
histogram is shown below.
The bulk of scores are between 60 and 100 or so. However, the left tail is
stretched out somewhat. So this distribution is left skewed.
Right: to the left, to the left. If we follow the x-axis to the left, we move towards more
negative scores. This is whyleft skewness is negative skewness.And indeed,
skewness = -1.0 for these scores. Their distribution is left skewed. However, it is less
skewed -or more symmetrical- than our first example which had skewness = 2.0.

Symmetrical Distribution Implies Zero Skewness


Finally, symmetrical distributions have skewness = 0. The scores on test 3
-having skewness = 0.1- come close.
Now, observed distributions are rarely precisely symmetrical. This is mostly seen for
some theoretical sampling distributions. Some examples are
 the (standard) normal distribution;

 the t distribution and


 the binomial distribution if p = 0.5.

These distributions are all exactly symmetrical and thus have skewness = 0.000...

Population Skewness - Formula and Calculation


If you'd like to compute skewnesses for one or more variables, just leave the
calculations to some software. But -just for the sake of completeness- I'll list the
formulas anyway.
If your data contain your entire population, compute the population skewness as:
Populationskewness=Σ(Xi−μσ)3⋅1NPopulationskewness=Σ(Xi−μσ)3⋅1N
where
 XiXi is each individual score;
 μμ is the population mean;
 σσ is the population standard deviation and
 NN is the population size.
For an example calculation using this formula, see this Googlesheet (shown below).

It also shows how to obtain population skewness directly by using=SKEW.P(...)where


“.P” means “population”. This confirms the outcome of our manual calculation. Sadly,
neither SPSS nor JASP compute population skewness: both are limited to sample
skewness.

Sample Skewness - Formula and Calculation


If your data hold a simple random sample from some population, use
Sampleskewness=N⋅Σ(Xi−X¯¯¯¯)3S3(N−1)
(N−2)Sampleskewness=N⋅Σ(Xi−X¯)3S3(N−1)(N−2)

where
 XiXi is each individual score;
 X¯¯¯¯X¯ is the sample mean;
 SS is the sample-standard-deviation and
 NN is the sample size.
An example calculation is shown in this Googlesheet (shown below).

An easier option for obtaining sample skewness is using=SKEW(...).which confirms the


outcome of our manual calculation.

Skewness in SPSS
First off, “skewness” in SPSS always refers to sample skewness: it quietly assumes that
your data hold a sample rather than an entire population. There's plenty of options for
obtaining it. My favorite is via MEANS because the syntax and output are clean and
simple. The screenshots below guide you through.

https://www.spss-tutorials.com/skewness/
The syntax can be as simple asmeans v1 to v5
/cells skew.A very complete table -including means, standard deviations, medians and
more- is run frommeans v1 to v5
/cells count min max mean median stddev skew kurt.The result is shown below.

Skewness - Implications for Data Analysis


Many analyses -ANOVA, t-tests, regression and others- require the normality
assumption: variables should be normally distributed in the population. The normal
distribution has skewness = 0. So observing substantial skewness in some sample data
suggests that the normality assumption is violated.
Such violations of normality are no problem for large sample sizes -say N > 20 or 25 or
so. In this case, most tests are robust against such violations. This is due to the central
limit theorem. In short,for large sample sizes, skewness is
no real problem for statistical tests.However, skewness is often associated with large
standard deviations. These may result in large standard errors and low statistical power.
Like so, substantial skewness may decrease the chance of rejecting some null
hypothesis in order to demonstrate some effect. In this case, a nonparametric test may
be a wiser choice as it may have more power.Violations of normality do pose a real
threat
for small sample sizesof -say- N < 20 or so. With small sample sizes, many tests
are not robust against a violation of the normality assumption. The solution -once again-
is using a nonparametric test because these don't require normality.
Last but not least, there isn't any statistical test for examining if population skewness =
0. An indirect way for testing this is a normality test such as
 the Kolmogorov-Smirnov normality test and
 the Shapiro-Wilk normality test.
However, when normality is really needed -with small sample sizes- such tests
have low power: they may not reach statistical significance even when departures from
normality are severe. Like so, they mainly provide you with a false sense of security.
And that's about it, I guess. If you've any remarks -either positive or negative- please
throw in a comment below. We do love a bit of discussion.
Skewness is a fundamental statistics concept that everyone in data science and
analytics needs to know. It is something that we simply can’t run away from. And
I’m sure you’ll understand this by the end of this article.
Here, we’ll be discussing the concept of skewness in the easiest way possible.
You’ll learn about skewness, its types, and its importance in the field of data
science. So buckle up because you’ll learn a concept that you’ll value during your
entire data science career.
Note: Here are a couple of resources to help you dive deeper into the world of
statistics for data science:
Table of Contents
 What is Skewness?
 Why is Skewness Important?
 What is a Normal Distribution?
 Understanding Positively Skewed Distribution
 Understanding Negatively Skewed Distribution
What is Skewness?
Skewness is the measure of the asymmetry of a probability distribution and is
given by the third standardized moment. If that sounds way too complex, don’t
worry! Let me break it down for you.
In simple words, skewness is the measure of how much the probability
distribution of a random variable deviates from the normal distribution. Now, you
might be thinking – why am I talking about normal distribution here?
Well, the normal distribution is the probability distribution without any skewness.
You can look at the image below which shows symmetrical distribution that’s
basically a normal distribution and you can see that it is symmetrical on both sides
of the dashed line. Apart from this, there are two types of skewness:
 Positive Skewness
 Negative Skewness

Credits: Wikipedia
The probability distribution with its tail on the right side is a positively skewed
distribution and the one with its tail on the left side is a negatively skewed
distribution. If you’re finding the above figures confusing, that’s alright. We’ll
understand this in more detail later.
Before that, let’s understand why skewness is such an important concept for you
as a data science professional.
Why is Skewness Important?
Now, we know that the skewness is the measure of asymmetry and its types are
distinguished by the side on which the tail of probability distribution lies. But why
is knowing the skewness of the data important?
First, linear models work on the assumption that the distribution of the
dependent variable and the target variable are similar. Therefore, knowing about
the skewness of data helps us in creating better linear models.
Secondly, let’s take a look at the below distribution. It is the distribution of
horsepower of cars:

You can clearly see that the above distribution is positively skewed. Now, let’s say
you want to use this as a feature for the model which will predict the mpg (miles
per gallon) of a car.
Since our data is positively skewed here, it means that it has a higher number of
data points having low values, i.e., cars with less horsepower. So when we train
our model on this data, it will perform better at predicting the mpg of cars with
lower horsepower as compared to those with higher horsepower. This is similar
to how class imbalance happens in classification problems.
Also, skewness tells us about the direction of outliers. You can see that our
distribution is positively skewed and most of the outliers are present on the right
side of the distribution.
Note: The skewness does not tell us about the number of outliers. It only tells us
the direction.
Now we know why skewness is important, let’s understand the distributions
which I showed you earlier.
What is a Symmetric/Normal Distribution?

Credits: Wikipedia
Yes, we’re back again with the normal distribution. It is used as a reference for
determining the skewness of a distribution. As I mentioned earlier, the normal
distribution is the probability distribution with almost no skewness. It is nearly
perfectly symmetrical. Due to this, the value of skewness for a normal distribution
is zero.
But, why is it nearly perfectly symmetrical and not absolutely symmetrical?
That’s because, in reality, no real word data has a perfectly normal distribution.
Therefore, even the value of skewness is not exactly zero; it is nearly
zero. Although the value of zero is used as a reference for determining the
skewness of a distribution.
You can see in the above image that the same line represents the mean, median,
and mode. It is because the mean, median, and mode of a perfectly normal
distribution are equal.
So far, we’ve understood the skewness of normal distribution using a probability
or frequency distribution. Now, let’s understand it in terms of a boxplot because
that’s the most common way of looking at a distribution in the data science
space.

The above image is a boxplot of symmetric distribution. You’ll notice here that the
distance between Q1 and Q2 and Q2 and Q3 is equal i.e.:

But that’s not enough for concluding if a distribution is skewed or not. We also
take a look at the length of the whisker; if they are equal, then we can say that
the distribution is symmetric, i.e. it is not skewed.
Now that we’ve discussed the skewness in the normal distribution, it’s time to
learn about the two types of skewness which we discussed earlier. Let’s start with
positive skewness.
Understanding Positively Skewed Distribution
Source: Wikipedia
A positively skewed distribution is the distribution with the tail on its right side.
The value of skewness for a positively skewed distribution is greater than zero. As
you might have already understood by looking at the figure, the value of mean is
the greatest one followed by median and then by mode.
So why is this happening?
Well, the answer to that is that the skewness of the distribution is on the right; it
causes the mean to be greater than the median and eventually move to the right.
Also, the mode occurs at the highest frequency of the distribution which is on the
left side of the median. Therefore, mode < median < mean.

In the above boxplot, you can see that Q2 is present nearer to Q1. This represents
a positively skewed distribution. In terms of quartiles, it can be given by:

In this case, it was very easy to tell if the data is skewed or not. But what if we
have something like this:
Here, Q2-Q1 and Q3-Q2 are equal and yet the distribution is positively skewed.
The keen-eyed among you will have noticed the length of the right whisker is
greater than the left whisker. From this, we can conclude that the data is
positively skewed.
So, the first step is always to check the equality of Q2-Q1 and Q3-Q2. If that is
found not equal, then we look for the length of whiskers.
Understanding Negatively Skewed Distribution

Source: Wikipedia
As you might have already guessed, a negatively skewed distribution is the
distribution with the tail on its left side. The value of skewness for a negatively
skewed distribution is less than zero. You can also see in the above figure that
the mean < median < mode.
In the boxplot, the relationship between quartiles for a negative skewness is given
by:

Similar to what we did earlier, if Q3-Q2 and Q2-Q1 are equal, then we look for the
length of whiskers. And if the length of the left whisker is greater than that of the
right whisker, then we can say that the data is negatively skewed.

How Do We Transform Skewed Data?


Since you know how much the skewed data can affect our machine learning
model’s predicting capabilities, it is better to transform the skewed data to
normally distributed data. Here are some of the ways you can transform your
skewed data:
 Power Transformation
 Log Transformation
 Exponential Transformation
Note: The selection of transformation depends on the statistical characteristics of
the data.
End Notes
In this article, we covered the concept of skewness, its types and why it is
important in the data science field. We discussed skewness at the conceptual
level, but if you want to dig deeper, you can explore its mathematical part as the
next step.

You might also like