How To Calculate Standard Deviation

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

How to Calculate Standard Deviation

(Guide) | Calculator & Examples


Published on September 17, 2020 by Pritha Bhandari. Revised on January 20, 2023.

The standard deviation is the average amount of variability in your dataset. It tells you, on
average, how far each value lies from the mean.

A high standard deviation means that values are generally far from the mean, while a low
standard deviation indicates that values are clustered close to the mean.

Table of contents

1.
2.
3.
4.
5.
6.

What does standard deviation tell you?


Standard deviation is a useful measure of spread for normal distributions.

In normal distributions, data is symmetrically distributed with no skew. Most values cluster
around a central region, with values tapering off as they go further away from the center. The
standard deviation tells you how spread out from the center of the distribution your data is on
average.

Many scientific variables follow normal distributions, including height, standardized test scores,
or job satisfaction ratings. When you have the standard deviations of different samples, you can
compare their distributions using statistical tests to make inferences about the larger populations
they came from.

Example: Comparing different standard deviationsYou collect data on job satisfaction ratings
from three groups of employees using simple random sampling.
The mean (M) ratings are the same for each group – it’s the value on the x-axis when the curve is
at its peak. However, their standard deviations (SD) differ from each other.

The standard deviation reflects the dispersion of the distribution. The curve with the lowest
standard deviation has a high peak and a small spread, while the curve with the highest standard
deviation is more flat and widespread.
The empirical rule
The standard deviation and the mean together can tell you where most of the values in
your frequency distribution lie if they follow a normal distribution.

The empirical rule, or the 68-95-99.7 rule, tells you where your values lie:

 Around 68% of scores are within 1 standard deviation of the mean,


 Around 95% of scores are within 2 standard deviations of the mean,
 Around 99.7% of scores are within 3 standard deviations of the mean.

Example: Standard deviation in a normal distributionYou administer a memory recall test to a


group of students. The data follows a normal distribution with a mean score of 50 and a standard
deviation of 10.
Following the empirical rule:

 Around 68% of scores are between 40 and 60.


 Around 95% of scores are between 30 and 70.
 Around 99.7% of scores are between 20 and 80.
The empirical rule is a quick way to get an overview of your data and check for any outliers or
extreme values that don’t follow this pattern.

NoteFor non-normal distributions, the standard deviation is a less reliable measure of variability
and should be used in combination with other measures like the range or interquartile range.

Standard deviation formulas for populations and


samples
Different formulas are used for calculating standard deviations depending on whether you
have collected data from a whole population or a sample.

Population standard deviation


When you have collected data from every member of the population that you’re interested in,
you can get an exact value for population standard deviation.

The population standard deviation formula looks like this:


Formula Explanation

  = population standard deviation


  = sum of…
  = each value
  = population mean
  = number of values in the population

Sample standard deviation


When you collect data from a sample, the sample standard deviation is used to make estimates
or inferences about the population standard deviation.

The sample standard deviation formula looks like this:

Formula Explanation

  = sample standard deviation


  = sum of…
  = each value
  = sample mean
  = number of values in the sample

With samples, we use n – 1 in the formula because using n would give us a biased estimate that
consistently underestimates variability. The sample standard deviation would tend to be lower
than the real standard deviation of the population.

Reducing the sample n to n – 1 makes the standard deviation artificially large, giving you a
conservative estimate of variability.

While this is not an unbiased estimate, it is a less biased estimate of standard deviation: it is
better to overestimate rather than underestimate variability in samples.

You might also like