CHAPTER+FOUR+the+Normal+Distribution
CHAPTER+FOUR+the+Normal+Distribution
CHAPTER+FOUR+the+Normal+Distribution
Random variables whose possible values fill up an entire interval are called continuous random variables. Their
probabilities may be represented by areas under curves.
One such example is a normally distributed random variable; it has probabilities equal to areas under a normal curve.
(A normal curve looks like a “bell”.) Normal curves play a prominent role in statistics because, in life, many
populations have bell-shaped histograms. Some examples:
Example 1
The heights (in inches) of the 750 players on the 2004 Opening Day rosters in Major League Baseball are summarized
below in a frequency distribution:
Example 2
Each of five thousand people will flip a fair coin 100 times. Each person then records the number of heads that occur
in the 100 flips; this yields a data set with 5000 numbers. (This data set was generated by computer simulation.) A
partial list of the data values and the histogram is given below:
(The mean of these 5000 values was 50.014, and the standard deviation was 3.574.)
For this reason, we often use normal curves for modeling populations (for finding percentages) and random
variables (for finding probabilities).
35
There are infinitely many normal curves from which to choose; they can be centered anywhere on the number line,
and they can have varying degrees of spread.
A normal curve has a specific equation that depends on two variables; one of the variables determines the number
in the center (), the other determines the curve to be either “flat” or “pointy” (). Choose a number for and another
number for , and you can plot this curve as a function of x. (Four normal curves are pictured below.)
Probabilities for normally distributed random variables are found by integrating the normal curve. Fortunately for
us, we won’t have to integrate; it’s already been done for us by computer and summarized on a chart. We will proceed
in four stages:
μ=0
σ=1
The standard normal chart gives you the area underneath the curve to the left of any number z. This is all we
will need!
36
Example
Find the area underneath the standard normal curve to the left of z = 1.08.
Solution μ=0
First draw the curve!!! Then look up z = 1.08 on the chart. σ=1
PROBLEM 4.1
Find the area underneath the standard normal curve to the right of z = 0.74.
PROBLEM 4.2
Find the area underneath the standard normal curve between z 1 = 0.34 and z2 = 1.90.
So if I give you the z-values, you can find the corresponding areas (to the left, right, in between). Now, I will give
you the areas and you give me the corresponding z-values!
Example
What is the z-value that has area underneath the standard normal curve to the left of it equal to .2946?
Solution μ=0
Draw the curve!!! Then look for the Z-value with area = .2946. σ=1
Answer: z = -0.54.
PROBLEM 4.3
What is the z-value that has area underneath the standard normal curve to the right of it equal to .6874?
PROBLEM 4.4
Find the two z-values, z1 and z2, that divide the standard normal curve into a middle .80 area and two tails of .10
area apiece.
That takes care of the standard normal curve. What about the others?
37
II) The Normal (,) Curve
NOTES: determines the center of the curve; determines its spread. Choosing =0 and =1 gives you the
standard normal curve.
* must be bigger than 0. If is “large” the curve is flat; if is “small” the curve is “pointy”.
Once a normal curve is specified (by a particular choice of and ), how do we find, say, the area to the left of
some number x?
X −
Z=
and then use the standard normal chart to find the area to the left of z! Two steps!!!
Example
The area underneath the normal (=6,=1.25) curve to the left of x=7.3 is equal to the area underneath the standard
normal curve to the left of z =(7.3-6)/1.25 = 1.04. The area is .8508.
PROBLEM 4.5
Find the area under the normal (=10, =4) curve to the left of x=16.8.
PROBLEM 4.6
Find the area under the normal (=305, =80) curve to the right of x=451.
PROBLEM 4.7
Find the area under the normal (=44, =3) curve between x1=39 and x2=45.
Next, I will specify the curve and give you the areas: you give me the corresponding x-values!
Example
Find the x-value that has area under the normal (=25, =6) curve to the left of it equal to .6558.
Solution
Draw the curve!!! Use the chart to find the corresponding z-value for that same area under the standard normal
curve.
μ=0
σ=1
38
You find z=0.40. Now take this z and “turn it back into an x”, using the same standardizing formula and a little
algebra:
X = Z +
This called de-standardizing. You get x = 6(0.40)+25 = 27.40. This is your answer.
PROBLEM 4.8
Find the x-value that has area under the normal (=1050, =80) curve to the right of it equal to .2295.
PROBLEM 4.9
Find the two x-values, x1 and x2, that divide the normal (=17, =3) curve into a middle .95 area and two tails of
.025 area apiece.
1. Finding the area under the normal (,) curve to the left/right of a given x-value:
2. Finding the x-value under the normal (,) curve for a given area to the left/right
Step 1: Use the chart to find the z-value that has that same area to the left/right under the
standard normal curve.
Step 2: De-standardize: X = σZ + μ
Definition
A population is said to be normal, or normally distributed, if percentages are (approximately) equal to areas under
a suitable normal curve.
In life, many populations of interest are normal: peoples’ heights, weights, final grade averages, the sizes of objects
mass-produced by machines. This is for reasons we will study later.
The next example illustrates how normal curves are used to approximate an exact percentage.
39
Example
The SAT scores (math + verbal) for all HS seniors who took the exam in 2018 are grouped below:
40
PROBLEM 4.10
As reported by the U.S. National Center for Health Statistics, males between 18-24 years of age have a mean weight
of 175 lbs. with a standard deviation of 14 lbs. If the population of weights is normally distributed, what percentage
of these males weigh
(a) less than 155 lbs.?
(b) more than 180 lbs.?
(c) between 200 and 210 lbs.?
PROBLEM 4.11
Suppose that the GPAs for graduating students at a particular state university follow a normal distribution with a
mean GPA of 2.88 and a standard deviation of 0.34.
(a) What percentage of graduates have GPAs below 3.00?
(b) Suppose that the university gives a certificate to every student who finishes with a GPA in the top 10%.
What GPA does a student need to be eligible for such a certificate?
PROBLEM 4.12
It is the end of the semester, and Dr. Smith wants to assign letter grades to the students in his Psychology class. He
has the students’ final averages, and wants to assign grades in such a way that:
the top 15% get an ‘A’
the next 30% get a ‘B’
the next 40% get a ‘C’
the next 10% get a ‘D’
the last 5% get an ‘F’.
If the final averages constitute a normally distributed population with a mean of 67 and a standard deviation of 15,
find the cutoff points for each grade.
Definition
For any number K between 0 and 100, the Kth percentile of a population is the number for which K% of the
population values fall beneath it.
(In Problem 4.11, you found the 90th percentile. In Problem 4.12, you found the 5th, 15th, 55th, and 85th percentiles.)
Definition
A random variable X is said to be normal, or normally distributed, if its probabilities are (approximately) equal to
areas under a suitable normal curve.
If and are the mean and standard deviation of the population, respectively, then
x = and x =
Definition
A standard normal random variable (one that we will call Z) is a random variable whose probabilities are equal to
areas under the standard normal curve.
z = 0 and z = 1
41
*if X is a normally distributed random variable, then
X −
Z=
is a standard normal random variable, with probabilities that are approximately equal to areas given by your chart.
Suppose X is a randomly selected value from a normally dist. population with mean , standard deviation .
The probability that X falls between two numbers a and b, i.e. P(a < X < b) is
= the percentage of all the population’s values that fall between a and b
= the area under the normal (,) curve between x1=a and x2=b.
= the area under the standard normal curve between
μ=0
σ=1
Probabilities can be found by standardizing the random variable; i.e. by turning X into Z.
Example
Suppose that X is a randomly selected member from the population of males in Problem 47. (normally distributed,
mean =175 lbs. and standard deviation = 14 lbs.) Suppose that a single male is selected at random. What is the
probability that he is
(a) heavier than 170 lbs.; i.e. what is P(X > 170)?
(b) between 160 and 200 lbs.; i.e. what is P(160 < X < 200)?
Solutions
(a) (b)
PROBLEM 4.13
The length of life of a certain brand of refrigerator is approximately normally distributed with a mean lifetime of
12.6 years and a standard deviation of 1.6 years. Let X be the lifetime of the refrigerator you (randomly) purchase.
(a) What is P(X < 9)?
(b) What is P(X > 14)?
(c) What is P(10 < X < 12)?
PROBLEM 4.14
The homes in a large neighborhood have values that are normally distributed with mean value of $375,400 and a
standard deviation of 61,250. Let X = the value of a randomly selected home. Find P($300,000 < X < $500,000).
42
Normal Approximation to the Binomial
Consider one of our earlier examples, from the last pages of Chapter 3. Assume that 35% of all American families
have a pet cat, and that a sample of n=50 families will be randomly selected.
Here is the exact probability distribution for X = the number of families in sample that have a pet cat (left, below)
and probability histogram (directly below):
The probabilities all come from the Binomial Probability Formula with n=50 and p=.35:
To find the probability that, say, X=19, you can find it above with the height of the bar over
19. It is equal to .105.
43
By fitting a normal curve to the histogram, we can approximate the probability with an area under the curve. Which
curve? Set μ = the expected value of X and = the standard deviation of X, with the formulas given in Chapter 4:
X = np
X = np(1 − p)
In this case, μx = np = 50(.35) = 17.5 and x = (50)(.35)(1–.35) = 3.37.
Now find the area underneath the (μ=17.5, =3.37) curve between 18.5 and 19.5 … see for yourself, you get .1045.
Very close!
What if you want to find the probability that at least 13 of the 50 families have a pet cat? To find the exact probability,
you would have to apply the Binomial Probability Formula many, many times (38 times!) to find:
… when in fact this is approximately the area to the right of 12.5 underneath the same (μ=17.5, =3.37) curve. (It is
the region in white.)
Again, see for yourself, you will get .9306. And yet again, very close!
44
Keep in mind that a Binomial random variable X is a discrete, and we are using a continuous curve for the
approximation. This requires that we add and/or subtract 0.5 from the relevant value(s), to approximate the area of
each bar in the histogram:
• To approximate P(X = t), find area between t ― 0.5 and t + 0.5
• To approximate P(X ≤ t), find area to the left of t + 0.5
• To approximate P(X ≥ t), find area to the right of t ― 0.5
• To approximate P(t1 ≤ X ≤ t2), find area between t1 ― 0.5 and t2 + 0.5
Notice that you add or subtract in the opposite direction of the desired region, so that you don’t omit half of the
imaginary bar’s area.
PROBLEM 4.15
Steve is a basketball player with a 72% free-throw shooting percentage. Suppose he attempts 125 free-throws at the
end of practice. Use the Normal Approximation to the Binomial to find the probability that Steve
(a) makes exactly 100 of his attempts
(b) makes 86 or fewer of his attempts
(c) makes at least 100 of his attempts
(d) makes between 78 and 95 of attempts, inclusive
(NOTE: “Inclusive” means that 78 and 95 are included!)
PROBLEM 4.16
Polls suggest that 20% of all American adults smoke cigarettes. Use the Normal Approximation to the Binomial to
find the probability that, in a sample of 750 Americans,
(a) exactly 135 of them smoke
(b) at least 140 smoke
(c) between 135 and 145 smoke
PROBLEM 4.17
We know that the probability of rolling “doubles” on any roll of a pair of balanced dice is 6/36 = .167. If you roll the
dice 300 times, use the Normal Approximation to the Binomial find the probability of getting
(a) at least 70 doubles
(b) no more than 42 doubles (i.e. 42 or fewer)
(c) between 45 and 55 doubles, inclusive
It should be noted that, for the approximation to be a good one, the number of trials (n) can’t be too small, although
how small depends additionally on the probability of success (p). The rule of thumb is that it works well whenever
np > 5 and n(1-p) > 5. (I’ll tell you when to use it.)
45