Ch.3 Normal Distribution
Ch.3 Normal Distribution
The normal distribution is a continuous probability distribution that can be used to model a vast number of Approximating a binomial distribution
naturally occurring scenarios. As a result, it is one of the most important probability distributions in statistics. Under certain conditions, the binomial distribution is statistically very similar to the normal distribution. As a result, the
Examples of natural variables that follow the normal distribution are IQ scores, height and weight. The key The standard normal distribution normal distribution can be used as an approximation for the binomial distribution.
difference between the normal distribution and the distributions you have previously met in Year 1 is that the We can use a coding to standardise our data, making it much easier to analyse and work with. For problems
normal distribution is continuous, while the others are discrete. where the mean and/or variance are unknown, it is very useful to code data via the standard normal § If 𝑛 is large and 𝑝 is close to 0.5, the binomial distribution 𝑋~𝐵[𝑛, 𝑝] can be approximated by the normal
distribution. distribution 𝑁(𝜇, 𝜎 ! ), where
- 𝜇 = 𝑛𝑝 Recall that the mean and the variance of a binomially
Characteristics - 𝜎 ! = 𝑛𝑝(1 − 𝑝) distributed random variable are 𝑛𝑝 and 𝑛𝑝(1 − 𝑝)
The normal distribution: 𝑋~𝑁(𝜇, 𝜎 ! ) § The standard normal distribution denoted 𝑍~𝑁(0. 1! ), has mean 0 and standard deviation 1. respectively.
𝑋~𝑁(𝜇, 𝜎 ! ) The reason that 𝑝 has to be close to 0.5 is because the normal distribution is symmetrical. Note that there is no specific
"#$
§ has two parameters: the mean, 𝜇, and variance 𝜎 ! . § If 𝑋~𝑁(𝜇, 𝜎 ! ), you can use the coding 𝑍 = to convert your variable into a standard normal range that 𝑝 must fall into for this approximation to be valid. The closer 𝑝 is to 0.5, the better the approximation is
%
§ Is symmetrical (mean = median = mode). variable.
§ has a bell-shaped curve with asymptotes at each end. However, there is a small problem: the binomial distribution is discrete while normal is continuous. This raises an issue of
§ has total area under the curve equal to 1. § The notation Φ(𝑎) is equivalent to 𝑃(𝑍 < 𝑎). inaccuracy with this approximation. To see why, take for example the probability 𝑃(𝑋 > 5) where X is discrete. This
§ has 𝑃(𝑋 = 𝑎) = 0 for any 𝑎. This is true for any continuous distribution. probability is associated with X taking the values 6, 7, 8, 9 and beyond. If we use a normal approximation and find
𝑃(𝑋 > 5) then we are finding the probability X is between 5 and 6 as well as anything beyond, which is inaccurate since
If a variable 𝑋 follows a normal distribution with mean 𝜇 and variance 𝜎 ! , we write 𝑋~𝑁(𝜇, 𝜎 ! ). we didn’t want any values of X between 5 and 6 to begin with. To tackle this issue, we can apply a process known as the
Example 4: The random variable 𝑋~𝑁(50, 4! ). Write 𝑃(𝑋 ≥ 55) in terms of Φ(𝑧) for some value 𝑧. continuity correction:
It is often very helpful to sketch the normal curve when solving normal distribution problems.
Since the normal distribution is continuous: 𝑃(𝑋 ≥ 55) = 𝑃(𝑋 > 55) § When using a normal approximation to approximate a binomial distribution, you must apply a continuity
Finding probabilities correction to ensure maximal accuracy when calculating probabilities.
𝑃(𝑋 > 55) = 1 − 𝑃(𝑋 < 55)
§ You need to be able to use the normal cumulative distribution function on your calculator to find Applying the continuity correction is as simple as adding or subtracting 0.5 from your value. You can use the following
probabilities. Converting to standard normal: table to help you decide when to add or subtract 0.5:
You will need to enter a lower and upper bound when using this function, as well as the mean and standard 55 − 𝜇 55 − 50 Discrete Continuous
deviation of the distribution being used. 𝑃(𝑋 < 55) = 𝑃 D𝑍 < E = 𝑃 D𝑍 < E = 𝑃(𝑍 < 1.25) 𝑃(𝑋 = 𝑎) 𝑃(𝑎 − 0.5 < 𝑋 < 𝑎 + 0.5)
𝜎 4
𝑃(𝑋 < 𝑎) 𝑃(𝑋 < 𝑎 − 0.5)
Example 1: The random variable 𝑋~𝑁(30, 2! ). Find 𝑃(𝑋 < 33). So 𝑃(𝑋 ≥ 55) = 1 − 𝑃(𝑍 < 1.25) = 1 − Φ(1.25) 𝑃(𝑋 > 𝑎) 𝑃(𝑋 > 𝑎 + 0.5)
𝑃(𝑋 ≤ 𝑎) 𝑃(𝑋 ≤ 𝑎 + 0.5)
Using the cumulative function on your calculator, we enter our mean as 30 and our standard 𝑃(𝑋 ≥ 𝑎) 𝑃(𝑋 ≥ 𝑎 − 0.5)
deviation as 2. Our upper bound will be 33 and for our lower bound we need to enter a really Finding 𝝁 and 𝝈
small value. Remember that the normal curve is asymptotic at both ends so to find an accurate Example 7: A drill bit manufacturer claims that 52% of its bits last longer than 40 hours. A random sample of 600 bits is
You need to be able to use the standard normal distribution to solve problems where you must find the mean
taken. Using a suitable approximation, find the probability between 300 and 350 bits last longer than 40 hours.
approximation for the total area to the left of 𝑥 = 33 we take our lower bound to be a small and/or variance. You will be given either one or two probabilities which you must standardise in order to find
value. Take for example, −500. the unknown parameters. We will go through two examples; one in which one parameter is missing and the Let X be the number of bits that last longer than 40 hours in a batch of 600. Then 𝑋~𝐵[600,0.52]. Since n is
other where both parameters are missing. large and p is close to 0.5, we can use a normal approximation with 𝜇 = 𝑛𝑝, 𝜎 = 𝑛𝑝(1 − 𝑝)
⇒ 𝑃(𝑋 < 33) = 0.933 (3 𝑠. 𝑓)
Example 5: The random variable 𝑋~𝑁(𝜇, 5! ) and 𝑃(𝑋 < 18) = 0.9032. Find the value of 𝜇. ⇒ 𝑛𝑝 = 600(0.52) = 312
⇒ 𝑛𝑝(1 − 𝑝) = 149.76
The inverse normal distribution function We are told that 𝑃(𝑋 < 18) = 0.9032. Standardising:
∴ 𝑋 ≈ ~ 𝑁(312, 149.76)
§ You can use the inverse normal function on your calculator to find the value of 𝑎 such that 𝑃(𝑋 < 18) = 𝑃 F𝑍 <
&'#$
G = 0.9032
𝑃(𝑋 < 𝑎) = 𝑝. ( We want to find 𝑃(300 < 𝑋 < 350). Applying the continuity correction, this becomes:
𝑃(300.5 < 𝑋 < 349.5). Using the calculator to find this probability gives us an answer of 0.825.
We can now use the inverse normal function (with 𝜇 = 0, 𝜎 = 1, 𝑎𝑟𝑒𝑎 = 0.9032) to find the value
Some problems might require you to instead find the value of 𝑎 such that 𝑃(𝑋 > 𝑎) = 𝑝. Be aware that most
calculators (e.g. the Casio fx-991ex) will only return the value of 𝑎 such that 𝑃(𝑋 < 𝑎) = 𝑝, so you will need
of 𝑎 such that 𝑃(𝑋 < 𝑎) = 0.9032: Hypothesis testing with the normal distribution
You also need to be able to test hypothesis regarding the mean of a normal distribution, by looking at a sample. You will
to use the property 𝑃(𝑋 > 𝑎) = 1 − 𝑃(𝑋 < 𝑎) in such situations. See example 3 for more detail.
The calculator returns a value of 𝑎 = 1.30 need to use the following fact:
Example 2: Given that 𝑋~𝑁(30, 2! ), find the value of 𝑎 such that 𝑃(𝑋 < 𝑎) = 0.4. &'#$
∴ = 1.30 ⇒ 𝜇 = 11.5. § For a random sample of size n taken from a random variable 𝑋~𝑁(𝜇, 𝜎 ! ), the sample mean 𝑋 is normally
( "!
Using the inverse normal function with 𝑝 = 0.4, mean = 30 and standard deviation = 2 gives us distributed with 𝑋~𝑁(𝜇, ).
#
!
𝑎 = 29.5. Example 6: The random variable 𝑋~𝑁(𝜇, 𝜎 ). Given that 𝑃(𝑋 < 17) = 0.8159 and 𝑃(𝑋 < 25) = 0.9970, find
the value of 𝜇 and 𝜎. The idea is that we use the sample mean distribution to see whether the mean from an actual sample is significant enough
Hint: Try sketching the probability curve to help visualise the distribution. to reject the null hypothesis.
We are told (𝑋 < 17) = 0.8159 and 𝑃(𝑋 < 25) = 0.9970. We standardise both of these cases Example 8: The diameters of circular cardboard drink mats produced by a particular machine are normally
separately, following the same method as in example 4: distributed with mean 9cm and standard deviation 0.15cm. After the machine is serviced a random
sample of 30 mats is selected and their diameters are measured to see if the mean diameter has
Example 3: Given that 𝑋~𝑁(30, 2! ), find the value of 𝑎 such that 𝑃(𝑋 > 𝑎) = 0.22.
&)#$ decreased. The mean of the sample was 8.95cm. Test, at the 5% significance level, if there is significant
⇒ 𝑃(𝑋 < 17) = 𝑃 F𝑍 < G = 0.8159 evidence to suggest the mean diameter of the machine has decreased.
%
We first manipulate our expression: !(#$
⇒ 𝑃(𝑋 < 25) = 𝑃 F𝑍 < G = 0.9970
% We are testing whether the mean has decreased, so this is a one-tailed test.
𝑃(𝑋 > 𝑎) = 1 − 𝑃(𝑋 < 𝑎) = 0.22 ∴ our hypotheses are: 𝐻$ : 𝜇 = 9 , 𝐻% : 𝜇 < 9
∴ 𝑃(𝑋 < 𝑎) = 0.78 Using the inverse normal function for both probabilities, we acquire two equations:
Let X be the diameter of the drink mats. Then 𝑋~𝑁(9, 0.15! ).
&)#$
Now using the inverse function with 𝑝 = 0.78, 𝜇 = 30 and 𝜎 = 2 gives us that 𝑎 = 31.5. = 0.8998 ⇒ 17 − 𝜇 = 0.8998𝜎 Our sample mean distribution will therefore be 𝑋~𝑁(9,
$.%'!
) since the sample size is 30.
% ($
!(#$
We use this to find 𝑃(𝑋 < 8.95) and compare this value to 5%.
= 2.748 ⇒ 25 − 𝜇 = 2.748𝜎
Note that some calculators (e.g. the Casio CG-50) will allow you to specify which tail of the distribution the %
Using a calculator, we find that 𝑃N𝑋 < 8.95O = 0.034 < 0.05
probability 𝑝 corresponds to, eliminating the need for any manipulation for problems similar to example 3.
We now have two equations with two unknowns, so we can solve for 𝜇 and 𝜎. You could use your
calculator for this part. ∴ Our result is significant and we can conclude that there is sufficient evidence to suggest the mean has
decreased (reject 𝐻$ ).
Solving gives us 𝜇 = 13.1 , 𝜎 = 4.33 (3 𝑠. 𝑓)