0% found this document useful (0 votes)
2 views11 pages

CHAPTER+FOUR+the+Normal+Distribution

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 11

CHAPTER FOUR: The Normal Distribution

Random variables whose possible values fill up an entire interval are called continuous random variables. Their
probabilities may be represented by areas under curves.

One such example is a normally distributed random variable; it has probabilities equal to areas under a normal curve.
(A normal curve looks like a “bell”.) Normal curves play a prominent role in statistics because, in life, many
populations have bell-shaped histograms. Some examples:

Example 1
The heights (in inches) of the 750 players on the 2004 Opening Day rosters in Major League Baseball are summarized
below in a frequency distribution:

Example 2
Each of five thousand people will flip a fair coin 100 times. Each person then records the number of heads that occur
in the 100 flips; this yields a data set with 5000 numbers. (This data set was generated by computer simulation.) A
partial list of the data values and the histogram is given below:

(The mean of these 5000 values was 50.014, and the standard deviation was 3.574.)

For this reason, we often use normal curves for modeling populations (for finding percentages) and random
variables (for finding probabilities).

35
There are infinitely many normal curves from which to choose; they can be centered anywhere on the number line,
and they can have varying degrees of spread.

A normal curve has a specific equation that depends on two variables; one of the variables determines the number
in the center (), the other determines the curve to be either “flat” or “pointy” (). Choose a number for  and another
number for , and you can plot this curve as a function of x. (Four normal curves are pictured below.)

Probabilities for normally distributed random variables are found by integrating the normal curve. Fortunately for
us, we won’t have to integrate; it’s already been done for us by computer and summarized on a chart. We will proceed
in four stages:

1) Finding areas under one particular (standard) normal curve


2) Finding areas under any normal curve (for any choice of ,)
3) Finding percentages for normally distributed populations
4) Finding probabilities for normally distributed random variables

I) The Standard Normal Curve

Properties of the Standard Normal Curve:

1. The total area underneath the curve is equal to 1.


2. The curve extends infinitely on both sides.
3. The curve is symmetric about 0.
4. “Most” of the area underneath the curve is between –3 and 3.
*the equation of the standard normal curve can be obtained by choosing =0 and =1.

μ=0
σ=1

Q: How do we find areas underneath this curve?


A: By using the standard normal chart!

The standard normal chart gives you the area underneath the curve to the left of any number z. This is all we
will need!

36
Example
Find the area underneath the standard normal curve to the left of z = 1.08.

Solution μ=0
First draw the curve!!! Then look up z = 1.08 on the chart. σ=1

This gives an area of .8599. That is the answer.

PROBLEM 4.1
Find the area underneath the standard normal curve to the right of z = 0.74.

PROBLEM 4.2
Find the area underneath the standard normal curve between z 1 = 0.34 and z2 = 1.90.

So if I give you the z-values, you can find the corresponding areas (to the left, right, in between). Now, I will give
you the areas and you give me the corresponding z-values!

Example
What is the z-value that has area underneath the standard normal curve to the left of it equal to .2946?

Solution μ=0
Draw the curve!!! Then look for the Z-value with area = .2946. σ=1

Answer: z = -0.54.

PROBLEM 4.3
What is the z-value that has area underneath the standard normal curve to the right of it equal to .6874?

PROBLEM 4.4
Find the two z-values, z1 and z2, that divide the standard normal curve into a middle .80 area and two tails of .10
area apiece.

That takes care of the standard normal curve. What about the others?

37
II) The Normal (,) Curve

Properties of all normal curves:

1. The total area underneath the curve is equal to 1.


2. The curve extends infinitely on both sides.
3*. The curve is symmetric about .
4*. “Most” of the area underneath is between -3 and +3.

NOTES:  determines the center of the curve;  determines its spread. Choosing =0 and =1 gives you the
standard normal curve.

* must be bigger than 0. If  is “large” the curve is flat; if  is “small” the curve is “pointy”.

Once a normal curve is specified (by a particular choice of  and ), how do we find, say, the area to the left of
some number x?

We standardize x; i.e. “turn it into a z”

X −
Z=

and then use the standard normal chart to find the area to the left of z! Two steps!!!

Example
The area underneath the normal (=6,=1.25) curve to the left of x=7.3 is equal to the area underneath the standard
normal curve to the left of z =(7.3-6)/1.25 = 1.04. The area is .8508.

PROBLEM 4.5
Find the area under the normal (=10, =4) curve to the left of x=16.8.

PROBLEM 4.6
Find the area under the normal (=305, =80) curve to the right of x=451.

PROBLEM 4.7
Find the area under the normal (=44, =3) curve between x1=39 and x2=45.

Next, I will specify the curve and give you the areas: you give me the corresponding x-values!

Example
Find the x-value that has area under the normal (=25, =6) curve to the left of it equal to .6558.

Solution
Draw the curve!!! Use the chart to find the corresponding z-value for that same area under the standard normal
curve.

μ=0
σ=1

38
You find z=0.40. Now take this z and “turn it back into an x”, using the same standardizing formula and a little
algebra:

X = Z + 
This called de-standardizing. You get x = 6(0.40)+25 = 27.40. This is your answer.

PROBLEM 4.8
Find the x-value that has area under the normal (=1050, =80) curve to the right of it equal to .2295.

PROBLEM 4.9
Find the two x-values, x1 and x2, that divide the normal (=17, =3) curve into a middle .95 area and two tails of
.025 area apiece.

SUMMARY: So far, we see two types of problems

1. Finding the area under the normal (,) curve to the left/right of a given x-value:

Step 1: Standardize: Z = (X – μ)/σ


Step 2: Use the chart to find the area under standard normal curve to the left/right of z.

2. Finding the x-value under the normal (,) curve for a given area to the left/right

Step 1: Use the chart to find the z-value that has that same area to the left/right under the
standard normal curve.
Step 2: De-standardize: X = σZ + μ

Now we apply these techniques to finding percentages and probabilities.

III) Normal Populations

Definition
A population is said to be normal, or normally distributed, if percentages are (approximately) equal to areas under
a suitable normal curve.

Q: When is this true?


A: Whenever the population has a bell-shaped histogram!

Q: Which normal (,) curve is “suitable”?


A: Choose  = mean of the population and  = standard deviation of the population

In life, many populations of interest are normal: peoples’ heights, weights, final grade averages, the sizes of objects
mass-produced by machines. This is for reasons we will study later.

The next example illustrates how normal curves are used to approximate an exact percentage.

39
Example
The SAT scores (math + verbal) for all HS seniors who took the exam in 2018 are grouped below:

Score Freq Score Freq


total # test-takers = 1,494,531
1600 1,206 990 27,199
mean score = 1017
1590 534 980 27,960
st. dev. of scores = 210
1580 583 970 27,834
1570 1,028 960 27,368
1560 1,001 950 26,834
1550 1,380 940 26,268
1540 1,829 930 27,005
1530 1,219 920 26,272
1520 2,114 910 25,647
1510 2,296 900 25,516
1500 2,668 890 24,221
1490 2,862 880 24,386
1480 3,099 870 23,388
1470 3,371 860 23,125
1460 3,723 850 22,209
1450 4,163 840 21,061
1440 4,542 830 21,145
1430 4,935 820 19,158
1420 5,300 810 18,892
1410 5,650 800 17,582
1400 6,227 790 16,889 We can use the given frequencies to find the exact percentage of test-
1390 6,656 780 15,788 takers who scored at least 1150:
1380 6,994 770 15,183
1370 7,537 760 14,415 = sum of frequencies of 1150 and greater divided by
1360 8,419 750 13,496
1,494,531
1350 8,739 740 12,095
1340 9,557 730 11,710
1330 10,254 720 10,925
= .274 or 27.4%.
1320 10,454 710 10,192
1310 11,352 700 9,234 We can get a really good approximation to the above percentage,
1300 12,154 690 8,643 using only
1290 12,510 680 7,808 the mean and standard deviation:
1280 13,367 670 7,487
1270 13,599 660 6,623
*Set μ=1017 and σ=210.
1260 14,183 650 6,344
1250 15,486 640 5,472
1240 15,774 630 5,134 Percentage ≈ area under the normal (μ=1017, σ=210) curve to the
1230 16,930 620 4,693 right of X=1150.
1220 17,414 610 4,398
1210 17,966 600 3,769 = area under the standard normal curve to the right of
1200 18,570 590 3,329
1190 19,293 580 3,302
1150−1017
1180 19,471 570 2,949
z= = 0.63
1170 20,874 560 2,433 210
1160 20,741 550 2,460
1150 21,580 540 2,072
1140 22,314 530 1,758
1130 22,922 520 1,621
1120 23,572 510 1,498
1110 23,816 500 1,369
1100 24,425 490 972
1090 24,966 480 1,100
1080 25,246 470 763
1070 26,107 460 538
1060 25,465 450 476
1050 27,094 440 455 area = .264 or 26.4%. (really close!)
1040 26,237 430 346
1030 27,772 420 332
1020 27,754 410 220
1010 27,194 400 568
1000 28,114

40
PROBLEM 4.10
As reported by the U.S. National Center for Health Statistics, males between 18-24 years of age have a mean weight
of 175 lbs. with a standard deviation of 14 lbs. If the population of weights is normally distributed, what percentage
of these males weigh
(a) less than 155 lbs.?
(b) more than 180 lbs.?
(c) between 200 and 210 lbs.?

PROBLEM 4.11
Suppose that the GPAs for graduating students at a particular state university follow a normal distribution with a
mean GPA of 2.88 and a standard deviation of 0.34.
(a) What percentage of graduates have GPAs below 3.00?
(b) Suppose that the university gives a certificate to every student who finishes with a GPA in the top 10%.
What GPA does a student need to be eligible for such a certificate?

PROBLEM 4.12
It is the end of the semester, and Dr. Smith wants to assign letter grades to the students in his Psychology class. He
has the students’ final averages, and wants to assign grades in such a way that:
the top 15% get an ‘A’
the next 30% get a ‘B’
the next 40% get a ‘C’
the next 10% get a ‘D’
the last 5% get an ‘F’.
If the final averages constitute a normally distributed population with a mean of 67 and a standard deviation of 15,
find the cutoff points for each grade.

Definition
For any number K between 0 and 100, the Kth percentile of a population is the number for which K% of the
population values fall beneath it.
(In Problem 4.11, you found the 90th percentile. In Problem 4.12, you found the 5th, 15th, 55th, and 85th percentiles.)

IV) Normal Random Variables

Definition
A random variable X is said to be normal, or normally distributed, if its probabilities are (approximately) equal to
areas under a suitable normal curve.

Q: When does this occur?


A: whenever X is a randomly selected value from a normally distributed population!

When selecting at random, probabilities = percentages!

If  and  are the mean and standard deviation of the population, respectively, then

x =  and x = 
Definition
A standard normal random variable (one that we will call Z) is a random variable whose probabilities are equal to
areas under the standard normal curve.

z = 0 and z = 1

41
*if X is a normally distributed random variable, then
X −
Z=

is a standard normal random variable, with probabilities that are approximately equal to areas given by your chart.

Suppose X is a randomly selected value from a normally dist. population with mean  , standard deviation .

The probability that X falls between two numbers a and b, i.e. P(a < X < b) is
= the percentage of all the population’s values that fall between a and b
= the area under the normal (,) curve between x1=a and x2=b.
= the area under the standard normal curve between

μ=0
σ=1

Probabilities can be found by standardizing the random variable; i.e. by turning X into Z.

Example
Suppose that X is a randomly selected member from the population of males in Problem 47. (normally distributed,
mean =175 lbs. and standard deviation = 14 lbs.) Suppose that a single male is selected at random. What is the
probability that he is
(a) heavier than 170 lbs.; i.e. what is P(X > 170)?
(b) between 160 and 200 lbs.; i.e. what is P(160 < X < 200)?

Solutions
(a) (b)

= .6406 or 64.06% chance = .8210 or 82.10% chance

PROBLEM 4.13
The length of life of a certain brand of refrigerator is approximately normally distributed with a mean lifetime of
12.6 years and a standard deviation of 1.6 years. Let X be the lifetime of the refrigerator you (randomly) purchase.
(a) What is P(X < 9)?
(b) What is P(X > 14)?
(c) What is P(10 < X < 12)?

PROBLEM 4.14
The homes in a large neighborhood have values that are normally distributed with mean value of $375,400 and a
standard deviation of 61,250. Let X = the value of a randomly selected home. Find P($300,000 < X < $500,000).

42
Normal Approximation to the Binomial

Consider one of our earlier examples, from the last pages of Chapter 3. Assume that 35% of all American families
have a pet cat, and that a sample of n=50 families will be randomly selected.

Here is the exact probability distribution for X = the number of families in sample that have a pet cat (left, below)
and probability histogram (directly below):

The probabilities all come from the Binomial Probability Formula with n=50 and p=.35:

P(X) = 50 C (.35) (1-.35)


X
X 50-X

To find the probability that, say, X=19, you can find it above with the height of the bar over
19. It is equal to .105.

P(X=19) = 50 C 19(.35) (1-.35)


19 50-19 = .105

43
By fitting a normal curve to the histogram, we can approximate the probability with an area under the curve. Which
curve? Set μ = the expected value of X and = the standard deviation of X, with the formulas given in Chapter 4:

 X = np
 X = np(1 − p)
In this case, μx = np = 50(.35) = 17.5 and x = (50)(.35)(1–.35) = 3.37.

Now find the area underneath the (μ=17.5, =3.37) curve between 18.5 and 19.5 … see for yourself, you get .1045.
Very close!

What if you want to find the probability that at least 13 of the 50 families have a pet cat? To find the exact probability,
you would have to apply the Binomial Probability Formula many, many times (38 times!) to find:

P(X=13) + P(X=14) + P(X=15) + P(X=16) + ………………….. + P(X = 49) + P(X=50) =.9339

… when in fact this is approximately the area to the right of 12.5 underneath the same (μ=17.5, =3.37) curve. (It is
the region in white.)

Again, see for yourself, you will get .9306. And yet again, very close!

44
Keep in mind that a Binomial random variable X is a discrete, and we are using a continuous curve for the
approximation. This requires that we add and/or subtract 0.5 from the relevant value(s), to approximate the area of
each bar in the histogram:
• To approximate P(X = t), find area between t ― 0.5 and t + 0.5
• To approximate P(X ≤ t), find area to the left of t + 0.5
• To approximate P(X ≥ t), find area to the right of t ― 0.5
• To approximate P(t1 ≤ X ≤ t2), find area between t1 ― 0.5 and t2 + 0.5

Notice that you add or subtract in the opposite direction of the desired region, so that you don’t omit half of the
imaginary bar’s area.

The bar over the possible value t whose area we


are approximating starts at t–0.5 and ends at t+0.5

PROBLEM 4.15
Steve is a basketball player with a 72% free-throw shooting percentage. Suppose he attempts 125 free-throws at the
end of practice. Use the Normal Approximation to the Binomial to find the probability that Steve
(a) makes exactly 100 of his attempts
(b) makes 86 or fewer of his attempts
(c) makes at least 100 of his attempts
(d) makes between 78 and 95 of attempts, inclusive
(NOTE: “Inclusive” means that 78 and 95 are included!)

PROBLEM 4.16
Polls suggest that 20% of all American adults smoke cigarettes. Use the Normal Approximation to the Binomial to
find the probability that, in a sample of 750 Americans,
(a) exactly 135 of them smoke
(b) at least 140 smoke
(c) between 135 and 145 smoke

PROBLEM 4.17
We know that the probability of rolling “doubles” on any roll of a pair of balanced dice is 6/36 = .167. If you roll the
dice 300 times, use the Normal Approximation to the Binomial find the probability of getting
(a) at least 70 doubles
(b) no more than 42 doubles (i.e. 42 or fewer)
(c) between 45 and 55 doubles, inclusive

It should be noted that, for the approximation to be a good one, the number of trials (n) can’t be too small, although
how small depends additionally on the probability of success (p). The rule of thumb is that it works well whenever
np > 5 and n(1-p) > 5. (I’ll tell you when to use it.)

45

You might also like