Introductionof Statistics
Introductionof Statistics
Introductionof Statistics
net/publication/329772871
Introduction of Statistics
CITATIONS READS
0 62,751
1 author:
Z. A. Al-Hemyari
University of Nizwa
114 PUBLICATIONS 549 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
13. Quality and Performance Management Model for Higher Education Institutions: Statistical and Quantitative Approach View project
All content following this page was uploaded by Z. A. Al-Hemyari on 19 December 2018.
Department of
Section : Statistics
STAT101: Introduction to Statistics
Lecture/Tutorial (3:2) 4CR
STAT101
Instructor :
• Introduction
1.1 Examples of statistical problems
1.2 Sources of data
1.3 Statistical data terminology
1.4 The acquisition of data: Surveys and
experiments
1.5 Obtaining data
1.6 Constructing questionnaires and schedules
1.7 Variables and scales of measurement
Tutorial 1
• Frequency Distributions
2.1 Introduction
2.2 Frequency distributions
2.3 Graphical presentations
Tutorial 2
• Measures of Location
3.1 Introduction
3.2 The mean
3.3 The mean of a distribution
3.4 The coding method
3.5 The mode
3.6 The median
3.7 Other numerical measures
3.7.1 Geometric mean
3.7.2 Quartiles and Percentiles
• Measures of Variation
4.1 Introduction
4.2 The range
4.3 The standard deviation
4.4 Measure of relative variation
4.5 Measure of skewness
Tutorial 4
• Regression Analysis
7.1 Introduction
7.2 Relationships between variables
7.3 Simple linear regression model
7.4 Fitting of a simple linear regression model
Tutorial 7
Probability Tables
1. Binomial cumulative distribution function.
2. Standard Normal distribution function
References
Introduction
to
Statistics
1
1
Introduction
2
1.1 Examples of Statistical Problems
3
The most important technological advance in this area has, of course,
been the development of the electronic digital computer. Statistical concepts
and methods, and the use of computers in statistical analyses, have affected
virtually all disciplines biology, physics, engineering, economics, sociology,
psychology, business, and others. In business and economics, the
development and application of statistical methods have led to greater
production efficiency, to better forecasting techniques, and to better
management practices. It is becoming increasingly apparent that some
knowledge of statistics and computers is essential for careers in economics,
business, administration, and many other fields as well. To gain an
appreciation for the breadth of applications of statistics to business and
economic problems in particular, let us consider three examples.
Example 1.1
In operations management, a primary concern is controlling the quality of
the items being produced. If the product is a transistor radio battery, for
example, we may be concerned with the longevity of the batteries. Suppose
it is desired that at least 95 percent of the batteries last through at least 20
hours of continuous use. The actual percentage of batteries lasting m ore
than 20 hours could be determined by inserting each and every battery
produced into a transistor radio and recording its time to failure, but then
there would be no batteries to sell. Rather, a manager may wisely decide in a
day's production to pull every 100th battery off the production line, insert
the sampled batteries in electrical test circuits and record their times to
failure. The percentage of these batteries lasting through more than 20 hours
of continuous use could be used to estimate the percentage of all batteries
produced during that day which will last more than 20 hours. Moreover, if
this estimated percentage drops much below 95 percent (say to 80 percent),
the manager may wish to stop the production line until he can determine
why the percentage of bad batteries appears to be greater than the tolerated 5
percent. The manager is using a percentage statistic computed from a sample
of all batteries produced to arrive at a decision regarding the quality of the
set of all batteries produced on a given day.This example portrays a common
phenomenon in quality control: destructive sampling. It is impossible to test
the quality (longevity) of each battery produced because the test for
longevity will ordinarily involve its destruction.
4
The manager has little recourse but to sacrifice a small number of
batteries (the sample) in order to gain information about the entire set of
batteries comprising the daily production (the population).
Example 1.2
Example 1.3
Politicians and their supporters aelection as the campaign heads towards
final balloting. By sampling 1,000 regire immensely interested in knowing
their prospects of winning an stered voters prior to the election, the
percentage who claim they will vote for a given candidate may be used to
estimate the percentage of the votes the candidate will receive hi the
election. The estimated percentage could be used to decide, for example,
whether a greater campaign effort (more money) is required to assure the
candidate's election. There are many more examples in business and other
areas which might be cited, but the above five should indicate the many
ways in which statistics can be employed. In the first two examples,
statistics is used to describe large bodies of data. In this application, the
word "statistic" is being used to describe a specific numerical quantity such
as an average or a total, and the collection of statistics is used to summarize
or condense a large set of numbers. These compiled statistics may, in turn,
be used to assist in decision making. In the last three examples, statistics
may be interpreted in a much broader sense; namely, the process of drawing
conclusions about an entire population or collection of things based upon a
sample, a subset of the population or collection.
5
Most students probably view statistics in the context of the first example
above; that is, as tables of figures, charts, and graphs (batting averages, pie
charts illustrating the sources of government revenue, and so on). This
concept is called descriptive statistics and was at one time the principal use
of statistics in business. Currently, there is an increasing interest in the
methods and uses of inferential statistics-the process of drawing inferences
about the whole (the population) from a subset of it (the sample), as
exemplified in Examples 1.1-1.3. Schematically, the process of drawing
inferences about an unknown population numerical quantity (the proportion
of defectives in a production lot, mean incomes of a class of laborers, etc..)
is illustrated in figure 1.1. Units are selected from the population to form the
sample which in turn is used to draw inferences about the population
characteristic of interest. Much of this text is devoted to the study of
statistical inference. In subsequent sections of this chapter, we will focus
attention on the sources of data, methods of obtaining data, and data
measurement considerations.
6
We mean:
sources of data outside the firm. External data may be of two types:
primary data and secondary data. By primary data , we mean data obtained
from the organization which originally collected them. An example is the
population data collected by and available from the US. Bureau of the
Census. Secondary data come from a source other than the one which
originally collected them.
7
which typically will discuss any restrictions placed on the data due to the
process of their collection. Thus, while secondary sources of data are
convenient, it usually is prudent to seek out and use primary sources of
external data.
Definition 1.1
Population and population characteristic
A population is the totality of units under study. A population
characteristic is an attribute of a population unit.
We may be interested, for example, in the salaries of workers in a particular
industry. If so, the population is the totality of these workers and the char-
acteristic of interest is each worker's salary. In collecting the salary data, we
may be interested in other population characteristics as well, including sex,
age, educational level, and other information. In general, a population unit
may have one or more characteristics of interest in a particular study.
8
Definition 1.2
A population census
A population census is the evaluation of each and every unit in the
population under study.
In some situations, it is possible to take a complete census of the popula-
tion. This rarely occurs in business unless the population size is very small,
due to cost and time considerations. A census of the US. population is under-
taken every ten years and it is truly a Herculean effort, subsidized, naturally,
by the taxpayers. The US. census produces a wealth of data of considerable
importance to the federal government and to firms and institutions, many of
whom view the census as an important source of external data.
In most instances, it is not possible to take a census of a population. It may
be too costly, too time consuming, or the evaluation process may destroy the
population unit as in Example 1.1.
Definition 1.3
A sample
A sample is a part of a population in which the population
characteristic is studied so that inferences may be made from the sample
study about the entire population.
Example 1.4
Suppose a tire manufacturer wishes to claim its new radial tire will last
40,000 miles or more. To support this claim, a sample of all tires produced
(the population) is selected for testing to determine how many miles the tires
will last. Since testing destroys the tires, a complete census of the population
is impossible.
9
may not be representative of the whole population.
Definition 1.4
Sampling error
Sampling error is the difference between the result of studying a
sample and infering a result about the population, and the result of a
census of the whole population.
Example 1.5
Errors in acquiring and tabulating statistical data can arise in other ways as
well, and these errors are called nonsampling errors.
Definition 1.5
Nonsampling error
Nonsampling errors are errors that occur in acquiring, recording, or
tabulating statistical data that cannot be ascribed to sampling error.
They may arise in either a census or a sample.
Nonsampling errors are usually more difficult to control and detect than
sampling errors.
Example 1.6
Suppose we are acquiring data on the 1,000 unionized workers mentioned
above. If we approached a particular worker and asked for his or her income,
we could be lied to-a troublesome and frequent source of nonsampling error
when a sensitive question is asked directly of a person. (What is your grade
point average?) In some instances, a person may give a false response out of
ignorance rather than by design. Another source of nonsampling error is in
recording the data. A"7" may be written as a "9," the decimal point may be
incorrectly placed, and so on. Errors may also occur in tabulating the data-
keypunching errors in preparing computer cards and typing errors in
10
transcribing data, for instance. It is always necessary to carefully edit data to
minimize the chance of nonsampling errors adversely affecting the statistical
analysis of the data.
The identification of the units in a population under study can often be a
surprisingly difficult task. We refer to a listing of population units as a
frame.
Definition 1.6
Population frame
The listing of all units in the population under study is called the
population frame.
Example 1.7
If the population is a production lot of units stored in a warehouse,
production records will give us a listing of the serial numbers of the units
from which each unit may be identified.
Example 1.8
If the population is the 1,000 unionized workers in a specific industry, union
membership records may serve as a frame.
But, what about a frame for all persons who will vote in a particular
election? A listing of registered voters is not appropriate, because in many
elections less than 50 percent of the registered voters actually vote. Some
classic errors have been made in identifying the frame.
11
Most of us are very familiar with surveys. As students, we are asked about
our opinions regarding dining hall food, impending tuition hikes, teaching
effectiveness and so on. Filling out survey questionnaires or answering an
interviewer's questions has become a routine occurrence in most of our lives.
Example 1.9
suppose we are interested in acquiring data on the salaries of 1,000
unionized workers in a specific industry. The population characteristic
"salary" may be affected by a host of factors-age , race, sex, educational
level, etc.. As we elicit a particular worker's salary, we have no control over
educational level, age, and so on these are existing attributes of the worker.
Definition 1.8
Statistical experiment
An experiment is a process of collecting data about population
characteristics when control is exercised over some or all factors that
may affect the characteristics of interest in the study.
Example 1.10
We may be interested, in the yield of a chemical process that is affected by
temperature and pressure. A variety of settings for temperature and pressure
could be selected, and the chemical process run for each setting to determine
the yield. In this way, the joint effect of temperature and pressure on yield is
studied in a controlled manner.
Example 1.11
12
Experiments almost always provide better information than do surveys,
but both are extremely important and useful tools for acquiring data. Though
an experiment should be preferred to a survey, much of the data used in
statistical analyses in business and economics are survey data. There are a
number of reasons for this. First, most internal and external data are
collected by surveys. Second, it is not always possible to conduct an experi-
ment to acquire the needed information. An interesting example of this is the
effect of smoking on health. Virtually all data on the relationship between
smoking and health are survey data; other factors that may affect health,
such as age, race, sex, and physiological properties, are not in the control of
those collecting the data. To run an experiment in this case would involve
controlling persons' lives. Some people in the experiment would be required
to smoke while others would not. It is neither feasible nor desirable to
approach the acquisition of the data for the study of the relationship between
smoking and health in this way.
13
4. Designing the survey or experiment.
The experiment must be designed so that we isolate the characteristic of
interest-the lifetime of the battery. Test circuits must be constructed and
carefully monitored when the selected batteries are inserted for testing.
5. Collecting and analyzing the data.
For each battery, the time to failure is recorded and the proportion of bat-
teries lasting 20 hours or more is calculated.
6. Reaching conclusions about the population characteristics.
The sample proportion of batteries surviving 20 or more hours is used as an
estimate of the population proportion that survive 20 or more hours.
7. Reporting the results.
The report should include a thorough description of the problem, the
sampling design, the testing method and the inferences. Sufficient monies
should be allocated for a competent writing of the report. Indeed, many
companies employ technical writers to put into "laymen's" words the
experimental results.
14
although the distribution mechanism depends to a large extent on the
purpose and nature of the questionnaire. For example, if the purpose of the
questionnaire is to survey the attitudes of those using public transportation,
the questionnaire may be distributed to persons while commuting to and
from work on buses, subways, and trains.
The use of questionnaires suffers from two serious drawbacks. First, if the
respondent has difficulty in interpreting the questions, no one is available for
assistance. If this situation arises, the information received may contain a
high degree of nonsampling error or the respondent may become frustrated
and not bother completing or returning the questionnaire. Further, if a
questionnaire is mailed to a household, it is often not clear who in the house-
hold responded to it. Second, questionnaires have typically an extremely
poor response rate. It is not uncommon to have less than 30 percent returned
on the first mailing of a questionnaire. The principal advantage of a
questionnaire is the low cost relative to the other means of obtaining
information. Most mail questionnaires may be bulk mailed at a reasonable
rate. But it is almost always necessary to contact nonrespondents to the first
mailing by subsequent mailings, telephone calls, or personal interviews, and
these costs must be planned for in a well-designed self-enumeration survey
or experiment. In most instances, those who do respond to the first mailing
of a questionnaire are not representative of the entire population. To use only
their responses would tend to bias the analytical results. Some self-
enumeration questionnaires do enjoy high initial response rates. Examples
are questions asked on warranty cards that must be returned to the
manufacturer for warranty coverage of a new product .
15
and sufficient remuneration must be provided to ensure that the interviewer
is competent and dedicated to the chore. It is always prudent in a personal
interview survey to call a selected set of respondents to ensure that they were
in fact contacted (as opposed to the interviewer filling in fake responses), to
ascertain if the interviewer's demeanor was appropriate, and to determine
whether the interviewer may have biased responses by making gestures
when stating the questions or recording the responses.
16
1.6.1 The Design
What is the likelihood of your using the following services for preventive
health care purposes in the next two years? (a) Dental check-up, (b) Eye
exam, (c) General physical.
a b c
Extremely unlikely () () ()
Unlikely () () ()
Slightly unlikely () () ()
Not certain () () ()
Slightly likely () () ()
Likely () () ()
Extremely likely () () ()
17
In this case, the respondent would select the "extremely likely" response
in place of "yes" if he or she is given the multiple choice response format,
for instance.
18
Given to the typical college student, the response will invariably be,
"agree." A less biased question might be, "The food in the dining hall is of
acceptable quality."
1.6.3 Editing
19
In this instance, the five characteristics-income, age, race, level of
education, and sex-are variables in the survey or experiment.
Further, we would call income a dependent variable and the other four
independent variables if we are concerned with how sex, age, level of
education, and race affect income. Income is the basic variable of interest
and our interest in the other variables is in their influence on income.
If we are measuring a set of variables from a population, the determination
of which are dependent and independent variables is a function of the
purpose of the survey or experiment. An independent variable in one study
may be a dependent variable in another.
A quantitative variable is one that can be measured numerically, such as
income and age. A qualitative variable is one that is nonnumeric , such as
sex, race, and level of education (high school, college, graduate school,
etc ...
In preparing data for analysis, we must be familiar with the four numerical
scales of measurement: nominal, ordinal, interval and ratio. The nominal
scale applies whenever we have used numbers only to categorize outcomes
of a variable. For instance, we could let a "male" be 1 and a "female" be 0,
but this numerical assignment is clearly arbitrary-a female could be
assigned 100 and a male, 0. The ordinal scale differs from the nominal
scale in that the ordering of the numbers has meaning. An example is the
responses to a multiple-response question:
agree disagree
-2 -1 0 +1 +2
The numerical assignments of -2, - 1, 0, 1, and 2 indicate the degree of
agreement, but they could just as easily have been 0, 10, 100, 200, and 500,
respectively. The key here is that while a -2 indicates stronger agreement
than a - 1, the difference between -2 and - 1 may not be the same as
between 0 and + 1. In the interval scale, the relative order of the numbers
is important, but so is the difference between them. This scale uses the
concept of unit distance such that the difference between any two numbers
may be expressed as some number of units. The interval scale requires a
zero point, but its location may be arbitrary.
20
Good examples of interval scales are the Fahrenheit and Celsius
temperature scales. Both have different zero points and unit distances. The
principle of an interval scale is not violated by a change in scale or
location or both. The ratio scale is used when the interval size is important
and also the ratio between two numbers has meaning. By this, we mean it
is appropriate to speak of one number being, say, twice as big as another.
This is clearly not possible with an interval scale, where, for instance,
80°F is not twice as "hot" as 40°F-measured on the Celsius scale, these
two temperatures are 27°C and 4°C, respectively, and 27°C is not twice
4°C. Examples of instances when ratio scales are appropriate are
measurements of heights, weights, and age. Most of the statistical methods
we will develop in this book require that the variable be measured at least
on the interval scale.
21
Tutorial 1
22
7. There are four measurement scales: nominal, ordinal, interval, and ratio.
Describe each, and give an example of a survey question that may use
measurements of each type.
8. For each of the following, indicate the scale of measurement: a. Red (1),
Blue (0), Yellow (- 1)
b. Extremely Likely (5), Likely (4), Indifferent (3), Unlikely (2) and Ex-
tremely Unlikely (1).
c. Pressure in pounds per square inch; from 0 to ∞.
d. Volume in cubic centimeters from 0 to ∞.
e. Age in years 0 to ?
f. Salary in dollars 0 to ?
g. Rank of a state in population 1 to 50.
10. Distinguish between sampling and nonsampling error. Which can occur
in a census? Which can occur in a sample?
23
2
Frequency Distributions
2.1 Introduction
2.2 Frequency distributions
2.3 Graphical presentations
Tutorial 2
24
2.1 Introduction
25
distribution) : It shows the frequencies with which the farm sizes are
distributed among the chosen classes. Tables of this sort, in which the data
are grouped according to numerical size, are called numerical or quantitative
distributions. In contrast, tables like the one given below, in which the data
are sorted according to certain categories, are called categorical or
qualitative distributions, as table 2.2 below:
Table 2.2
1967 Motor Vehicle
Registration
(thousands)
United States 96945
Other North and Central America 8900
South America 5490
Europe 65969
Africa 3822
Asia 13937
Oceania 5519
26
2.2 Frequency Distributions
27
Note that the first three, but not the fourth,of these rules were observed in
the construction of the farm-size distribution on page 23, assuming that the
figures were rounded to the nearest acre. (Had these figures been rounded to
the nearest tenth of an acre, a farm of, say, 49.6 acres could not have been
accommodated, as it would have fallen between the second class and the
third.) The fourth rule was violated in two ways: First, the intervals from 10
to 49 acres, 100 to 179 acres, and 260 to 499 acres, among others, cover
unequal ranges of values. Second, the first and last classes are open-for all
we know, the last class might include farms of a million acres or more, and
if we had grouped profits and losses instead of acreages, the first class might
even have included negative values. If a set of data contains a few values
that are much greater (or much smaller) than the rest, open classes can help
to simplify the over-all picture by reducing the number of required classes;
otherwise, open classes should be avoided as they can make it impossible (or
at least difficult) to give further descriptions of the data.
As we have pointed out in the preceding paragraph, the appropriateness of
a classification may depend on whether the data are rounded to the nearest
acre or to the nearest tenth of an acre. Similarly, it may depend on whether
data are rounded to the nearest dollar or the nearest cent, whether they are
given to the nearest inch, the nearest tenth of an inch, or the nearest
hundredth of an inch, and so on. Thus, if we wanted to group the amounts of
the sales made by a saleslady in a department store, we might use the
classification given in table 2.3 below
Table 2.3
Size of Sale
(dollars)
0.00 - 4.99
5.00 - 9.99
10.00 – 14.99
15.00 – 19.99
20.00 – 24.99
Etc.
28
And if we wanted to group the heights of children measured to the nearest
tenth of an inch, we might use the classification shown in table 2.4
Table 2.4
Height
(inches)
20.0 - 29.9
30.0 – 39.9
40.0 – 49.9
50.0 – 59.9
Etc.
Note that in each of these examples the nature of the data is such that a value
can fall into one and only one class.
To give a concrete illustration of the construction of a frequency dis-
tribution, let us consider the following data(table 2.5) representing the scores
which 150 applicants for secretarial positions in a large company obtained in
an achievement test:
Table 2.5
27 79 69 40 51 88 55 48 36 61
53 44 94 15 65 42 58 55 69 63
70 48 61 551 60 25 47 78 61 54
57 76 73 62 36 67 40 51 59 68
27 46 62 43 54 83 59 13 72 57
82 45 54 52 71 53 82 69 60 35
41 65 62 75 60 42 55 34 49 45
49 64 40 61 73 44 59 46 71 86
43 69 54 31 56 51 75 44 66 53
80 71 53 56 91 60 41 29 56 57
35 54 43 39 56 27 62 44 85 61
59 89 60 51 71 53 58 26 77 68
29
62 57 48 69 76 52 49 45 54 41
33 61 80 57 42 45 59 44 68 73
55 70 39 58 69 51 85 46 55 67
Since the smallest of these scores is 17 and the largest is 94, it would
seem reasonable (for most practical purposes) to choose the nine classes
going from 10 to 19, from 20 to 29, . . ., and from 90 to 99. Performing the
actual tally and counting the number of values falling into each class, we
obtain the results shown in table 2.6 . The numbers shown in the right-hand
column of this table are called class frequencies; they give the number of
items falling into each class. Also, the smallest and the largest values that
can go into any given class are referred to as its class limits; thus, the class
limits of the above table are 10 and 19, 20 and 29, 30 and 39, and so on.
More specifically, 10, 20, 30, . . ., and 90 are referred to as the lower class
limits, while 19, 29, 39, ..., and 99 are referred to as the upper class limits of
the respective classes.
Table 2.6
30
instead as the "real" class limits. In order to make this concept apply also to
the classes which are at the two extremes of a distribution, we simply act as
if the table were continued in both directions. Thus, the first class of the
above distribution of the 150 scores has the lower boundary 9.5, while the
last class has the upper boundary 99.5.
Suppose now that in connection with the scores of the 150 applicants for
secretarial positions, it is of interest to know how many fell below various
levels. To provide this information, we have only to convert the distribution,
table 2.6on page 30 into what is called a cumulative frequency distribution
or simply a cumulative distribution. Successively adding the frequencies in
the table, we thus obtain the following "less than" cumulative distribution,
shown in table 2.7
31
Table 2.7
Cumulative
Scores
Frequencies
Less than 10 0
Less than 20 1
Less than 30 7
Less than 40 16
Leas than 50 47
Less than 60 89
Less than 70 121
Less than 80 138
Less than 90 148
Less than 100 150
Note that in this table we could just as well have written "9 or less" instead
of "less than 10," "19 or less" instead of "less than 20," ..., and "99 or less"
instead of "less than 100."
If we successively add the frequencies starting at the other end of the
distribution, we similarly get a cumulative "or more" distribution (or a
cumulative "more than" distribution), which shows how many of the scores
are "10 or more" (or "more than 9"), how many are "20 or more" (or "more
than 19"), and so on.
Sometimes it is preferable to show what percentage of the items falls into
each class, or what percentage of the items falls above or below various
values. To convert a frequency distribution (or a cumulative distribution)
into a corresponding percentage distribution, we have only to divide each
class frequency (or each cumulative frequency) by the total number of items
grouped and multiply by 100. For instance, for the size-of-farm distribution
on page 25, it may be more informative to indicate that
(183/ 3,156)100 = 5 .8 per cent of the farms are under 10 acres, that
(637/3,156)100 =20.2 per cent of the farms are from 10 to 49 acres, and so
on. Generally speaking, percentage distributions are useful, especially when
we want to compare two or more sets of data.
32
For instance, it may well be more informative to say that the percentages of
farms under 10 acres in two counties are, respectively, 5 per cent and 6 per
cent, than to report that in one county 16 of 321 farms and in the other
county 43 of 717 farms are under 10 acres.
So far we have discussed only numerical distributions, but the general
problem of constructing categorical (or qualitative) distributions is very
much the same. Again we must decide how many classes (categories) to use
and what kind of items each category is to contain, making sure that all of
the items are accommodated and that there are no ambiguities. Since the
categories must often be selected before any data are actually obtained,
sound practice is to include a category labeled "others" or "miscellaneous."
When dealing with categorical distributions we do not have to worry about
such mathematical details as class limits, class boundaries, class marks, etc..;
on the other hand, we now have a more serious problem with ambiguities,
and we must be careful and explicit in defining what each category is to
contain. For instance, if we tried to classify items sold at a supermarket into
"meats," "frozen foods," "baked goods," and so oil, it would be difficult to
decide where to put, for example, frozen beef pies. Similarly, if we wanted
to classify occupations, it would be difficult to decide where to put a farm
manager, if our table contained (without qualification) the two categories
"farmers" and "managers." For this reason, it is often advisable to use
standard categories developed by the Bureau of the Census and other
government agencies.
The most common among all graphical presentations of statistical data is the
histogram, an example of which is shown in figure 2.1. A histogram is
constructed by representing
33
measurement or observations that are grouped (in figure 2.1 the scores)
on a horizontal scale, the class frequencies on a vertical scale, and
drawing rectangles whose bases equal the class interval and whose
heights are determined by the corresponding class frequencies. The
markings on the horizontal scale can be the class limits as in figure 2.1,
the class boundaries, the class marks, or arbitrary key values.
34
For easy readability it is generally preferable to indicate the class limits,
although the bases of the rectangles actually go from one class boundary to
the next. Similar to histograms are bar charts, like the one of figure 2.2,
where the lengths of the bars are proportional to the class frequencies, but
there is no pretense of having a continuous (horizontal) scale.
There are several points that must be watched in the construction of
histograms. First, it must be remembered that this kind of figure cannot
be used for distributions with open classes. Second, it should be noted
35
that the picture presented by a histogram can be very misleading if a
distribution hits unequal classes and no suitable adjustments are made. To
illustrate this point, let us regroup the distribution of the 150 scores by
combining all those from 60 to 79 into one class. Thus, the new distribution
is given by the following table
Table 2.8
Scores Frequency
10-19 1
20-29 6
30-39 9
40-49 31
50-59 42
60-79 49
80-89 10
90-99 2
and its histogram (with the class frequencies represented by the heights of
the rectangles) is shown in figure 2.3. This figure gives the impression that
just about half the scores fall on the interval from 60 to 79, where as
36
the correct proportion is close to 1/3,49/150 to be exact. This error is due
to the fact that when we compare the size of rectangles, triangles, and
other plane figures, we instinctively compare their areas and not their
sides. In order to correct for this, we simply draw the rectangles of the
histogram so that the class frequencies are represented by their areas, and
not by their heights. In figure 2.4 we accomplished this by reducing the
height of the rectangle representing the class 60-79 to half of what it was
in figure 2.3.
37
The practice of representing class frequencies by means of areas is
especially important if histograms are to be approximated with smooth
curves. For instance, if we wanted to approximate the histogram of
figure 2.1 with a smooth curve, we could say that the number of scores
exceeding 69 is given by the shaded area of figure 2.5. Clearly, this area
is approximately equal to the sum of the areas of the corresponding three
rectangles.
An alternate, though less widely used, form of graphical presentation
is the frequency polygon (see figure 2.6). Here the class frequencies are
plotted at the class marks and the successive points are connected by
means of straight lines. Note that we added classes with zero frequencies
38
39
40
at both ends of the distribution in order to "tie down" the graph to the
horizontal scale.
If we apply the same technique to a cumulative distribution, we obtain
what is called an ogive. Note, however, that now the cumulative frequencies
41
are not plotted at the class marks-it stands to reason that the cumulative
frequency corresponding, say, to "less than 20" in our example should be
plotted at 20, or preferably at the class boundary of 19.5, since "less than 20"
actually includes everything up to 19.5. figure 2.7 shows an ogive
representing the cumulative "less than" distribution of the scores of the 150
applicants.
42
Tutorial 2
1. Decide for each of the following quantities whether it can be deter-
mined on the basis of the distribution of the 150 scores on page 30; if
possible, give a numerical answer:
(a) The number of scores which were at least 50.
(b) The number of scores which were greater than 50.
(c) The number of scores which were 80 or less.
(d) The number of scores which were less than 80.
(e) The number of scores which were more than 90.
(f) The number of scores which were greater than 39 but at most 69.
2. If the amounts paid for the repairs of cars damaged in accidents are
grouped into a frequency table with the classes $0.00-$99.99,
$100.00$199.99, $200.00-$299.99, $300.00-$399.99, $400.00-
$499.99, and $500.00 or more, decide for each of the following
quantities whether it can be determined on the basis of this
distribution:
(a) How many of the amounts were less than $200.00.
(b) How many of the amounts were at least $200.00.
(c) How many of the amounts were more than $200.00.
(d) How many of the amounts were $200.00 or more.
3. The following is the distribution of the weekly earnings of 1, 216
secretaries in the Phoenix, Arizona, metropolitan area in March, 1969:
Number of
Weekly Earnings Secretaries
Under $80 21
$80- $99 296
$100-$119 494
$120-$139 247
$140-$159 119
$160 and over 39
43
(a) The number of secretaries with weekly earnings of at least $120.
(b) The number of secretaries with weekly earnings of more than $120.
(c) The number of secretaries with weekly earnings of more than $180.
(d) The number of secretaries with weekly earnings of less than $100.
(e) The number of secretaries with weekly earnings of at most $100.
(f) The number of secretaries with weekly earnings of at least $60.
4. The number of students absent from school each day are grouped into a
distribution having the classes 3-10, 11-18, 19-26, 27-34, and 35-42. Find
(a) the limits of each class, (b) the class boundaries, and (c) the class
marks.
5. The following is the distribution of the actual weight (in ounces) of 50
"one-pound" bags of coffee, which a grocery clerk filled from bulk stock:
Weight Number of bags
15.5 – 15.6 3
15.7 – 15.8 9
15.9 – 16.0 17
16.1 – 16.2 14
16.3 – 16.4 6
16.5 – 16.6 1
Find (a) the limits of each class, (b) the class marks, and (c) the class
boundaries.
6. The weights of certain laboratory animals, given to the nearest tenth of
an ounce, are grouped into a table having the class boundaries 11.45,
13.45, 15.45, 17.45, and 19.45 ounces. What are the limits of the four
classes of this distribution?
7. The class marks of a distribution of temperature readings, given to the
nearest degree Fahrenheit, are 113, 128, 143, 158, and 173. Find the
class boundaries of this distribution, and also the class limits.
8. Class limits and class boundaries have to be interpreted very carefully
when we are dealing with ages, for the age group from 5 through 9, for
example, includes all those who have passed their fifth birthday but not
44
yet reached their tenth. Taking this into account, what are the boundaries
and the class marks of the following age groups: 10-19, 20-29, 30-39, and
40-49.
9. A study of air pollution in a city yielded the following daily readings of
the concentration of sulfur dioxide (in parts per million):
.04 .11 .05 .01 .15 .12 .19 .06 .13 .03
.18 .01 .08 .11 .08 .14 .02 .14 .08 .10
.17 .09 .14 .07 .13 .11 .09 .05 .15 .08
.06 .05 .12 .10 .27 .12 .16 .10 .09 .I5
.07 .10 .17 .13 .20 .18 .11 .17 .14 .04
.22 .11 .09 .02 .12 .16 .15 .12 .13 .07
.05 .14 .04 .16 .19 .10 .06 .03 .16 .13
.18 .13 .11 .09 .06 .23 .11 .12 .07 .11
(a) Group these data into a table having the classes .00-.04, .05--.09,
.10-.14, .15-.19, .20-.24, and .25-.29.
(b) Convert the distribution obtained in (a) into a cumulative "less
than" distribution.
(c) Construct a histogram of the distribution obtained in (a).
(d) Draw an ogive of the cumulative "less than" distribution obtained
in (b) and use it to read off (roughly) the value below which we
should find the lowest, half of the data
10. The following are the number of customers a restaurant served for
lunch on 120 week days
50 64 55 51 60 41 71 53 63 64 46 59
66 45 61 57 65 62 58 65 55 61 50 55
53 57 58 66 53 56 64 46 59 49 64 60
58 64 42 47 59 62 56 63 61 68 57 51
61 51 60 59 67 52 52 58 64 43 60 62
48 62 56 63 55 73 60 69 53 66 54 52
56 59 65 60 61 59 63 56 62 56 62 57
57 52 63 48 58 64 59 43 67 52 58 47
63 53 54 67 57 61 65 78 60 66 63 58
60 55 61 59 74 62 49 63 65 55 61 54
45
(a) Group these data into a table having the classes 40-44, 45-49,
50-54, 55-59, 60-64, 65-69 ,70-74 and 75- 79.
(b) Convert the distribution obtained in (a) into a cumulative "less than"
distribution.
(c) Construct a histogram of the distribution obtained in (a).
(d) Draw an ogive of the cumulative "less than" distribution obtained in (b)
and use it to read off (roughly) the value below which we should find the
lowest, half of the data .
46
3
Measures of Location
3.1 Introduction
3.2 The arithmetic mean
3.3 The mean of a distribution
3.4 The coding method
3.5 The mode
3.6 The median
3.7 Other numerical measures
3.7.1 Geometric mean
3.7.2 Quartiles and Percentiles
47
3.1 Introduction
Descriptions of statistical data can be quite brief or quite elaborate,
depending partly on the nature of the data themselves, and partly on the
purpose for which they are to be used. Sometimes, we even describe the
same set of data in several different ways. To draw an analogy, a large motel
might describe itself to the public as having luxurious facilities, a heated
swimming pool, and TV in every room; on the other hand, it might describe
itself to the fire department by giving the floor space of each unit, the
number of sprinklers, and the number of employees. Both of these
descriptions may serve the purpose for which they are designed, but they
would hardly satisfy the State Corporation Commission in passing on the
owner's application for issuing stock. This would require detailed
information on the management of the motel, various kinds of financial
statements, and so on.
Whether we describe things statistically or whether we simply describe
them verbally, it is always desirable to say neither too little nor too much.
Thus, it may sometimes be satisfactory to present data simply as they are
and let them "speak for themselves"; in other instances it may be satisfactory
to group, classify, and present them using the methods of Chapter 2.
However, most of the time it is necessary to summarize them further by
means of one or more well-chosen descriptions. In this chapter and in
chapter 4 we shall concentrate mainly on two kinds of descriptions ,called
measures of location, and measures of variation.
The measures of location we shall study in this chapter are also referred
to at times as "measures of central tendencies," "measures of central values,"
and "measures of position." Except for some of the measures discussed in
Section 3.4, they may also be referred to crudely as "averages" in the sense
that they provide numbers that are indicative of the "center," "middle," or the
"most typical" of a set of data.
When we said that the choice of a statistical description depends partly
on the nature of the data themselves, we were referring among other things
to the following distinction: if a set of data consists of all conceivably possi-
ble (or hypothetically possible) observations of a certain phenomenon, We
refer to it as a population; if it contains only part of these observations, we
refer to it as a sample. The qualification "hypothetically possible" was
added to take care of such clearly hypothetical situations where, say, twelve
flips of a coin are looked upon as a sample from the population of all
48
possible flips of the coin, or where we shall want to look upon the weights of
eight 30-day-old calves as a sample of the weights of all (past, present, and
future) 30-day-old calves. In fact, we often look upon the results obtained in
an experiment as a sample of what we might obtain if the experiment were
repeated over and over again.
In this chapter and the next we shall limit ourselves to methods of
description without making generalizations, but it is important even here to
distinguish between samples and populations. As we have said before, the
kind of description we may want to use will depend on what we intend to do
later on, whether we merely want to present facts about populations or
whether we want to generalize from samples. We shall, thus, begin in this
chapter with the practice of using different symbols depending on whether
we are describing samples or populations; in Chapter 4 we shall carry this
distinction one step further by even using different formulas.
Definition 3.1
Arithmetic mean
The arithmetic mean of a set of n numbers is defined simply as their sum
divided by n.
Example 3.1
Given that the total attendance at major league baseball games in the years
1965, 1966, 1967, and 1968 was, respectively, 22.4, 25.2, 23.8, and 23.0
million, we find that the mean, namely, the "average" annual attendance for
these foul: years was (22.4+25.2+23.8+23.0)/4= 23.6 million
In order to develop a simple formula for the mean that is applicable to any
set of data, it will be necessary to represent the figures (measurements or
observations) to which the formula is to be applied with some general
49
symbols such as x, y, or z. In the above example, we could have represented
the annual attendance figures with the letter x and referred to the four values
Mean x =(x1+x2+…..+xn)/n
This formula is perfectly general and it will take care of any set of data, but
it is still somewhat cumbersome. To make it more compact, we introduce the
symbol ∑ (capital sigma, the Greek letter for S), which is simply a
mathematical shorthand notation indicating the process of summation or
addition. If we write ∑x, this represents the "sum of the x's ," and we have
x‾=∑ xi/n
Using the sigma notation in this form, the number of terms to be added is not
stated explicitly; it is tacitly understood, however, to refer to all the X's with
which we happen to be concerned. . For a further discussion of the use of
subscripts and the ∑ notation, we shall finish simplifying our notation by
assigning a special symbol to the mean itself. If we look upon the x's as a
sample, we write their mean as x‾(x-bar); if we look upon them as a
population, we write their mean as μ .If we refer to sample data as y's or z's,
we correspondingly write their means as y or z. To further emphasize the
distinction between samples and populations, we denote the number of
values in a sample, the sample size, with the letter n and the number of
values in a population, the population size, with the letter N. We thus have
the formulas x =∑xi/n , µ=∑xi/N ,
50
depending on whether we are dealing with a sample or a population. In order
to distinguish between descriptions of samples and descriptions of
populations, statisticians not only use different symbols, but they refer to the
first as statistics and the second as parameters. Hence, we say that x is a
statistic and that µ is a parameter.
The popularity of the mean as a measure describing the "middle" or
"center" of a set of data is not just accidental. Anytime we use a single
number to describe a set of data, there are certain desirable properties we
must keep in mind.
Thus, some of the noteworthy properties of the mean are:
(1) it is familiar to most persons, although they may not call it by this name,
(2) it always exists, that is, it can be calculated for any kind of numerical
data,
(3) it is always unique, or in other words, a set of data has one and only one
mean,
(4) it takes into account each individual item ,
(5) it lends itself to further statistical manipulation , (it is possible to
combine the means of several sets of data into an over-all mean without
having to refer back to the original raw data), and
(6) it is relatively reliable in the sense that it does not vary too much when
repeated samples are taken from one and the same population, at least not as
much as some other kinds of statistical descriptions.
This question of reliability is of fundamental importance when it comes
to problems of estimation, hypothesis testing, and making predictions, and
we shall have a good deal more to say about it later in this book.
Since the computation of means is quite easy, involving only addition and
one division, there is usually no need to look for short-cuts or simplifi-
cations. However, if the numbers are unwieldy, that is, if each number has
many digits, or if the sample (or population) size is very large, it may be
advantageous to group the data first and then compute the mean from the
resulting distribution. Another reason why we shall investigate the problem
of obtaining means from grouped data is that published data are very often
available only in the form of distributions.
51
3.3 The mean of a distribution
Definition 3.2
Mean of a distribution
X=∑xi . fi /∑ fi,
where n equals f1 + f2 +. ..+ fk, the sum of the class frequencies, or ∑ fi.
(When dealing with a population instead of a sample, we have only to
substitute µ for x in this formula and N for n.)
Example 3.2
To illustrate the calculation of the mean of a distribution, let us refer again
to the distribution of the scores of the 150 applicants on chapter 2. Writing
the class marks in the second column, we get
Class Marks Frequencies Products
X f x. f
10-19 14.5 1 14.5
20-29 24.5 6 147.0
30-39 34.5 9 310.5
40-49 44.5 31 1379.5
50-59 54.5 42 2289.0
60-69 64.5 32 2064.0
70-79 74.5 17 1266.5
80-89 84.5 10 845.0
90-99 94.5 2 189.0
Total 150 8505.0
52
x‾=8505.0/150=56.7
The calculation of the mean of the distribution of the 150 scores was fairly
easy because the frequencies were all small. Even so, the calculations can be
simplified by performing a change of scale; that is, we replace the class
marks with numbers that are easier to handle. This is also referred to as
"coding," and in our example, we might replace the class marks of the
distribution of the scores with the consecutive integers -4, -3, -2, -1, 0, 1, 2,
3, and 4. Of course, when we do something like this, we also have to account
for it in the formula we use to calculate the mean. Referring to the new
(coded) class marks as u's, it can easily be shown that the formula for the
mean of a distribution becomes
Definition 3.3
Coding (shortcut) Mean
The coding (shortcut) mean is given by
_
X = x0+(∑ui . fi/n) . c
where xo is the class mark (in the original scale) to which we assign 0 in the
new scale, c is the class interval, n is the number of items grouped, and
∑ ui*fi is the sum of the products obtained by multiplying each of the coded
class marks by the corresponding frequency.
Example 3.3
53
Class
ui fi ui.f
Marks
14.5 -4 1 -4
24.5 -3 6 -18
34.5 -2 9 -18
44.5 -1 31 -31
54.5 0 42 0
64.5 1 32 32
74.5 2 17 34
84.5 3 10 30
94.5 4 2 8
Total 150 33
X = 54.5+(33/150)10=56.7,
should be noted that this agrees with the result obtained earlier; the short-
cut formula does not entail any further approximation, and it should always
yield the same result as the formula of definition 3.2.
Unless one can use an automatic computer, the short-cut method will
generally save a good deal of time; about the only time that the short-cut
method will not provide appreciable savings in time and energy is when the
original class marks are already easy-to-use numbers. In order to reduce the
work to a minimum, it is generally advisable to put the zero of the u-scale
near the middle of the distribution, preferably at a class mark having one of
the highest frequencies.
Remark 3.1:
A fact worth noting is that this short-cut method cannot be used for
distributions with unequal classes, although there exists a modification
which makes it applicable also in that case. Neither the short-cut formula nor
the formula on definition 3.2 is applicable to distributions with. open classes;
the means of such distributions cannot be found without going back to the
raw data or making special assumptions about the values which fall into an
open class.
54
Tutorial 3.1
55
(What general simplification does this suggest for the
calculation of means?)
7. Twenty-four cans of a floor wax, randomly selected from a large
production lot, have the following net weights (in ounces): 12.0, 11.9,
12.2, 12.0, 11.9, 12.0, 12.0, 12.1, 11.8, 12.0, 12.0, 12.1, 11.9, 11.9, 12.2,
12.1, 12.0, 11.9, 11.9, 12.1, 12.0, 12.0,11.9, and 12.0.
(a) Calculate the mean of these 24 weights.
(b) Recalculate the mean of these 24 weights by first
subtracting 12.0 from each value, finding the mean of the
numbers thus obtained, and then adding 12.0 to the result.
(What general simplification does this suggest for the
calculation of means?)
8. The following are the number of twists that were required to
break 20 forged alloy bars: 37, 29, 34, 21, 54, 38, 30, 26, 48, 37,
24, 33, 39, 51, 44, 38, 35, 29, 46, and 31. Find the mean of these
values.
9. In business and economics, there are many problems in which
we are interested in index numbers, that is, in measures of the
changes that have taken place in the prices (quantities, or values)
of various commodities. In general, the year or period we want to
compare by means of an index number is called the given year or
given period, while the year or period relative to which the
comparison is made is called the base year or base period.
Furthermore, given-year prices are denoted pn base-year prices
are denoted po, and the ratio pn / po for a given commodity is
called the corresponding price relative. A very simple kind of
index number is given by the mean of the price relatives of the
commodities with which we are concerned, multiplied by 100 to
express the index as a percentage.
(a) Find the mean of the price relatives comparing the 1969
prices of the given processed fruits and vegetables (in cents) with
those of 1965:
56
1965 1969
Fruit cocktail, No. 303 can 26.1 27.9
Pears , No.21 can 47.0 50.9
Frozen orange juice, 6 oz, 23.7 24.3
Pears , No.303 can 23.7 24.6
Tomatoes, No. 303 can 16.1 19.6
Frozen broccoli, 10 oz, 26.4 27.6
(b) Find the mean of the price relatives comparing the following 1967
prices with those of 1960, where all prices are in cents per pound:
1960 1967
Copper 32.4 38.6
Lead 11.9 14.0
Zinc 12.9 13.8
10. If we substitute q's for p's in the index number of Exercise 9. where
given-year quantities (produced, sold, or consumed) are denoted qn and
base-year quantities are denoted q0 , we obtain a corresponding quantity
index. Given the following data in thousands of short tons, find the mean
of the quantity relatives comparing the 1967 production figures with
those of 1960:
1960 1967
Copper 1080 954
Lead 247 317
Zinc 435 549
11. Another way of obtaining an index comparing given-year prices with a
corresponding set of base-year prices (see Exercise 9) is to average the
two sets of prices separately, take the ratio of the two means, and then
multiply by 100 to express the index as a percentage. Canceling
denominators, the formula for such a simple aggregative index is thus
57
3.5 The Mode
Definition 3.4
Mode
The mode, denoted by mo, of a set n observations x1,x2,…,xn
(or of a frequency table)is the value of X which occurs with
greatest frequency.
Example 3.4
Solution:
mo=3
Example 3.5
Example 3.6
Solutions:
In the case, we will say that the set of numbers does not have a
mode.
58
One of these is the median, which is defined simply as:
Definition 3.5
Median
The median of a set of data ,is the value of the middle item (or the mean of
the values of the two middle items) when the data are arranged in an
increasing or decreasing order of magnitude.
59
To find the median of a distribution with a total frequency of n, we
must, so to speak, count n/2 items starting at either end and use def.3.6.
Definition 3.6
Median
Example 3.7
To illustrate this procedure , let us refer again to the distribution of the 150
scores since n=150 in this example , we will have to count n/2= 75 items
from either end. Beginning at the bottom of the distribution, we find that 47
of the values are less than 50 while 89 are less than 60, so that the median
must fall into the class whose limits are 50-59. Since 47 of the values fall
below this class, we must count another 75 - 47 = 28 of its 42 values, and we
accomplish this by adding 42 of the class interval of 10 to 49.5, the lower
boundary of the class. (We add 42 of the class interval because we want to
count 28 of the 42 values contained in this class.) We thus get
60
3.7 Other Numerical Measures
There are numerous representative measures other than the mean, median,
and mode. The geometric mean is commonly used in business problems to
describe the "average" of ratios. Although the geometric mean is not as
important as the three principal representative measures (mean, median, and
mode).
1/n
GM= [x1 . x2 . x3 …xn] .
The geometric mean is the nth root of the product of all the measurements.
It is not as easily computed as the arithmetic mean-the computation is eased
somewhat by taking logarithms of both sides of above equation
it is apparent that the geometric mean can be computed by taking the antilog
of the arithmetic mean of the logs of the measurements.
Example 3.8
Determine the geometric mean of the three measurements:
x1 =2 , x2=4 and x 3=8
Solution:
1/3
GM = ((2)(4)(8)) =4.
Example 3.9
Find the arithmetic and geometric mean of 100, 100, 100 and 1000.
61
Solution:
X = (100+100+100+1000)/4 =325
=1/4[2+2+2+3]=1/(4*9)=2.25
2.25
GM = (10) = 177.8.
The above example illustrates the fact that the geometric mean is less
affected by one (or two) extremely large (or small) values than is the
arithmetic mean. Unfortunately, the geometric mean is neither easy to
compute nor amenable to use for statistical inferences. It is very useful,
however, in averaging ratios-a process that frequently arises in computing
cost-of-living or other index numbers.
The upper quartile (q3) is the value of x that exceeds 3/4, of the
measurements is less than the remaining 1/4.
Example 3.10
Find the lower, middle, and upper quartiles for the data set:
62
20, 34, 17, 18, 28, 33, 12, 1S, 17, 12, 41,
4S, 18, 19, 16, 21, 26, 14, 26, 13, 29
Solution:
Ordered from the smallest to the largest, the 21 measurements are:
12, 12, 13, 14, 15, 16, 17, 17, 18, 18, 19, 20, 21, 26, 26, 28, 29,33,34,41, 45
↑ ↑ ↑
q1 = 15.25 q2 = 19 q3 = 28.75
To determine the first quartile, one fourth of the measurements is 21 /4 = 5.2
and three-fourths is 15.75. We wish to find, therefore, the measurement in
the data set such that 5.25 of the measurements are below it and 15.75 are
above it. Of course, no such measurement exists, so to find q1, we must
interpolate between the values of the fifth and sixth measurements, 15 and
16. This results in q1= 15.25.
Definition 3.9
The Pth percentile
,
The Pth percentile of a set of n measurements x1, x2, . . . , xn denoted by P,
is the value of x such that P percent of the values are less than P and (100 -
P) percent of the values are greater than P.
Example 3.11
Find the 85th percentile for the data set in Example 3.6.
Solution:
Since 85 percent of n = 21 is 17.85, we are looking for the measurement
such that 17.85 of the measurements are below it and 3.15 are above. This
value lies between 29 and 33. By interpolation
P = 32.4 .
By looking at the difference among the quartiles, we can get a feel for the
variability of the data. One measure of variability using the quartiles is the
interquartile range defined by
q3-q1 .
The larger the interquartile range, the more spread out the set of measure-
ments will be.
63
Tutorial 3.2
Construct a frequency distribution with 5 classes for this data. Give the
relative frequencies and construct a histogram from the frequency
distribution. Compute the mean, median, and mode .
3. Find the median ,mean and the mode for the data given in Ex9 ,and Ex10
of tutorial 2.
64
4
Measures of Variation
4.1 Introduction
4.2 The range
4.3 The standard deviation & the variance
4.4 The standard deviation & the variance of grouped data
4.5 Measure of relative variation
4.6 Measure of skewness
Tutorial: 4
65
4.1 Introduction
66
4.2 The range
Definition 4.1
Range
The range of a set of measurements x1, x2,..., xn is the algebraic
difference between the largest and smallest values.
Example 4.1
Given the following 6 numbers, determine their range.
x1=5 , x2=0 , x3=6 , x4=2 , x5=-2 , x6=9
Solution:
The largest number is 9 while the smallest is - 2. Thus the
range is 9 -(- 2) = 11.
67
Figure 4.2 Two distributions with equal ranges
68
if they were all positive and divided by n., we would obtain a measure of
variation called the mean deviation (see Exercises 5 and 6 on tutorial V).
Unfortunately, this measure of variation has the drawback that, owing to the
absolute values, it is difficult to subject it to any sort of theoretical
treatment; for instance, it is difficult to study mathematically how in
problems of sampling, mean deviations are affected by chance. However,
there exists another way of eliminating the signs of the deviations from the
mean, which is preferable on theoretical grounds: The squares of the
deviations from the mean cannot be negative; in fact, they are positive
unless a value happens to coincide with the, mean, in which case ∑(xi-x)is
equal to zero.
Definition 4.2
Variance
The variance of n measurements x1,x2,…,xn is denoted by s2 , is given by
s2 =∑( xi - x‾ )2/n,
and this is how, traditionally, the variance has been defined. Expressing
literally what we have done here mathematically, it has also been called the
mean-square deviation.
Nowadays, it has become the custom among most statisticians and
research workers to make a slight modification in this definition, which
consists of dividing the sum of the squared deviations from the mean by
n- 1 instead of n. Following this practice, which will be explained later,
let us thus formally define the sample variance , as
2
s* =∑( xi - x ) 2 /(n – 1) .
Definition 4.3
Standard deviation
The standard deviation is denoted by s(s*) is given by
s= (∑( xi – x‾ ) 2 / n) ½ ,
or
s*= (∑( xi – x‾ ) 2 /( n-1)) ½ .
69
The formulas we have given so far in this section are meant to apply to
samples, but if we substitute µ for x‾ and N(or N-1) for n(or n-1), we obtain
analogous formulas for the standard deviation and the variance of a
population. It has become fairly general practice to write population standard
deviations as S when dividing by N and S* when dividing by N - 1;
symbolically,
Definition 4.4
Population Standard Deviation
The Population standard deviation of N observation is denoted by S(S*) is
given by
2 1/2
S = (∑( xi- μ ) /N) ,
or
2 1/2
S* = (∑( x i- μ) /(N-1)).
For instance, for n = 5 the estimates would on the average be (5-1)/5 = 0.80
or 80 per cent of what they should be,and hence 20 per cent too small. To
compensate for this we divide by n - 1 instead of n in the formulas for the
sample standard deviation and the sample variance. As the statisticians say,
this makes the sample variance
2 2
s unbiased; that is, if we calculate s* for several samples taken
from the same population, the values we get should average S2, the
variance of the population. Note, however, that this modification is of no
significance unless n is small; generally, its effect is negligible when it is
large, say 100 or more. The same applies to the difference between S2 and
S*2, which is negligible unless the size of the population is very small, and
in actual practice this is usually not the case.
70
Example 4.2
x‾=(12+18+7+11+15)/6=12.
and then the remainder of the calculations are as shown in the following
table
x (x-x‾) (x- x‾) 2
12 0 0
18 6 36
7 -5 25
11 -1 1
15 3 9
9 -3 9
0 80
and
s*= (∑( xi – x‾ ) 2 /( n-1)) ½
.
=(80/(6-1)) 1/2 =(16) 1/2 =4
71
the mean were all whole numbers. Had this not been the case, it might have
been profitable to use the following short-cut formula for s:
.
s*=((n(∑ x2) – (∑ x) 2)/n(n-1)) ½
This formula does not involve any approximations and it can be derived
from the other formula for s by using the rules for summations. The
advantage of this short-cut formula is that we do not have to go through the
process of actually finding the deviations from the mean; instead we
calculate ∑x, the sum of the x's , ∑ x2, the sum of their squares, and
substitute directly into the formula. Referring again to the burglary data, we
now have
x x2
12 144
18 324
7 49
11 121
15 225
9 81
72 944
= (6(944)-(72)2/6.5) ½ =4.
72
x (x- x‾ ) ( x- x‾ ) 2
12 2.75 7.5625
7 -2.25 5.0625
9 -0.25 0.0625
5 -4.25 18.0625
4 -5.25 27.5625
8 -1.25 1.5625
17 7.75 60.0625
2 -7.75 52.5625
11 1.75 3.0625
14 4.75 22.5625
13 3.75 14.0625
9 -0.25 0.0625
111 0 212.2500
and
x‾ = (111/12)=9.25 ,
s*2 =(212.2500/11)=19.3,
x X2
12 144
7 49
9 81
5 25
4 16
8 64
17 289
2 4
11 121
14 196
13 169
9 81
111 1239
73
s*2=(12(1239)-(111) 2) /12.11
=19.3.
which is exactly what we had before. Since the purpose of this trick is to
reduce the size of the numbers with which we have to work, it is usually
desirable to subtract, a number that is close to the mean. In our example the
mean was 9.25, and the calculations might have been even simpler if we had
subtracted 9 instead of 10. Although the short-cut formula was given for use
with samples, we have only to substitute N2 for n throughout to make the
formula applicable to the calculation of s* or s*.
Definition 4.5
Standard deviation for Grouped Data
The standard deviation of grouped data is
s* = ( ( ∑( xi – x ) 2 . fi/(n-1)) 1/2 ,
74
n=∑fi,
where the x's are the class marks and the f's the corresponding class fre-
quencies, or we use the same kind of coding as in the calculation of the
mean of grouped data.
Example 4.4
To illustrate the use of this short-cut formula for the calculation of s for
grouped data, let us refer again to the distribution of the scores of the150
applicants. Using the same u-scale, we get
75
Class Marks ui2.fi
ui fi ui .fi
xi
14.5 -4 1 -4 16
24.5 -3 6 -18 54
34.5 -2 9 -18 36
44.5 -1 31 -31 31
54.5 0 42 0 0
64.5 1 32 32 32
74.5 2 17 34 68
84.5 3 10 30 90
94.5 4 2 8 32
33 359
and
s*=c((n(∑(ui2 fi)-(∑uifi) 2)/n(n-1)) /12 ,
=10((150(359)-(33) 2 )150.149) ½
=15.4 .
The variation of the scores of the 150 applicants is, thus, measured by a
standard deviation of 15.4, and we shall indicate below how such a figure
might be interpreted.
76
is being measured. The most widely used measure of relative variation is the
coefficient of variation.
Definition 4.6
Coefficient of Variation
The coefficient of variation of grouped (or non-grouped data) is denoted by
CV ,is given by the formula
CV = (s/ x ) . 100.
This simply expresses the standard deviation of a set of data (or distribution)
as a percentage of its mean. When dealing with populations, we ,analogously
define the coefficient of variation as
CV = (S/μ) . 100
If in the above example the standard deviation s = 15.4 and x‾=56.7 ,then
… CV=(15.4/56.7) . 100. …
Sk=3( x - m)/s .
77
If the distribution is skewed right, the mean will be larger than the median
and Sk will be positive. If the distribution is skewed left, the mean will be
smaller than the median and Sk will be negative.
The effect of dividing by s in Sk is to produce a statistic which is not depen-
dent on the unit of measurement. The mean, median and standard deviation
are all measured in the same units for a given data set.
The skewness measure Sk can be used in two ways. First, the sign of Sk
indicates the direction of skewness: +, skewed right and -, skewed left.
Second, if Sk is larger in magnitude in one data set than in another, the first
data set distribution is more skewed than the other. That is, Sk can be used
as a relative measure of the degree of skewness among data sets.
Example 4.5
Compute Sk for the following set of 5 measurements:
x1=10 , x2=4 , x3=4 , x4=6 and x5=1.
Solution:
The mean, median and standard deviation for this data set are x = 5, m = 4
and s= 2.97. Therefore,
Sk=3 . (5-4)/2.97=1.01
Since Sk is positive, the distribution of the 5 measurements is skewed to the
right.
78
Tutorial 4
1. Find the range , s2, s ,CV &Sk of the data of Ex. 6,7 & 8 of tutorial 3.
2. The following is the distribution of the percentage of students belonging
to a certain minority group in 40 schools:
Percentage Frequenc
y
0- 4 14
5- 9 11
10-14 7
15-19 6
20-24 2
Class Freq.
0-2 10
3-5 6
6-8 3
9-11 1
Calculate :
a.s2 & s.
b.median.
c.mode.
d. CV.
e. Sk.
79
4.Suppose that the random variable x has the following table
Class Freq.
0-7 2
8-15 10
16-23 8
24-31 3
32-39 2
80
5
Introduction to
Probability & Random
Variables
5.1 Introduction
5.2 The sample and event spaces
5.3 Computing probabilities from the sample space
5.4 Permutations, combinations, and other counting rules
5.5 Random variable
5.6 Probability mass function
5.7 Probability density function
Tutorials 5.1 & 5.2
81
5.1 Introduction
Probability plays an integral role in inferential statistics by building a
"bridge" between the population and the sample taken from it. Our initial
applications of probability in this connection will be to make deductions
about a sample from a known population .The use of probability is as
indicated in figure 5.1: probability reasons from the population to the
sample, while statistical inferences are drawn about the population from the
sample.
Example 5.1
As an example of the use of probability in this context, consider the
national election for the office of president of the United States. Let us sup-
pose that only two candidates are listed on the ballot for the presidency, the
Democratic candidate (A), and the Republican candidate (B). Further,
suppose it is known in the population of registered persons who will vote on
election day that 60 percent will vote for A and 40 percent will vote for B. If
we now randomly sample one person from this population, what is the
probability that he or she will vote for A? Since we know that 60 percent of
the persons will vote for A, the probability that the one sampled person will
vote for A is 0.60. Knowledge of the probabilities of the two possible out-
comes of the experiment (voting for A or voting for B) enables us to deduce
the probability of the outcome in our sample of one.
Indeed, by using this knowledge, we could deduce the probabilities of
zero, one, or two persons voting for A in a sample of two, and so on for
82
larger sample sizes. Thus, if the population is known in the sense that the
probabilities associated with the values in it are known, then this knowledge
can be used to deduce the probabilities of the outcomes in the sample.
To illustrate the use of probability in making inferences from a sample to the
population, suppose candidate A conjectures before the election that 60
percent of the people in the voting population will vote for him. In order to
check this conjecture, his campaign manager randomly samples ten indi-
viduals from the population and finds that all ten intend to vote for candidate
B. If the probability of a randomly chosen person voting for A is really 0.60,
it is extremely unlikely that ten randomly chosen persons would all vote for
candidate B. It is more likely that the true percentage who will vote for A is
something considerably less than 60 percent. Hence, knowledge of this
experiment (sample outcome) indicates to A that more resources (cam-
paigning, etc..) may have to be employed if he is to have a chance of
winning the election. Candidate A is interested, of course, in testing whether
this sample is indicative of the population characteristics (voting patterns),
whether more sample information should be obtained, or whether the elec-
tion is likely to go to B (in which case A would be wasting his time by
campaigning further).
In practical situations, probability is used as a vehicle in drawing inferences
about unknown population characteristics. Additionally, as we shall see
later, probability concepts can be used to give us an indication of how good
these inferences are.
In this chapter, we will assume the population is known and compute the
probability of the occurrence of various sample outcomes. In effect, we will
be selecting a probability model depicting the outcomes in the population. In
practical applications of statistics, we shall see that the selection of this
model is an integral part of the statistical inference process.
83
tacting each person to determine the outcome ("1" or "0") is called an
experiment.
Definition 5.1
Experiment
An experiment is a process which results in one and only one outcome
of a set of disjoint outcomes, where the outcomes cannot be predicted
with certainty.
In the voting example, there are only two possible outcomes, and they are
disjoint (non-overlapping): a zero and a one. With our previous assumption
that only two candidates are listed on the ballot, each experiment results in
one and only one of the two possible experimental outcomes. And we cannot
predict with certainty the outcome before the experiment is conducted.
Repeated trials of this experiment will generate the population of zeroes and
ones. Other examples of experiments are:
Example 5.2
A professor at a large university is selected and his salary is recorded.
Example 5.3
A unit of a product is selected from an assembly line and is analyzed to
determine whether it is defective.
Example 5.4
A light bulb is randomly selected from the day's production and its time to
failure measured.
By repeating an experiment many times, a population of outcomes can be
generated. For example, if we repeated the experiment in 3. until each and
every light bulb in the day's production run had been tested to failure, the
population of all times to failure of this set of light bulbs would have been
generated. In the process of doing this, it should also be noted that the entire
day's production of light bulbs (the population) would have been destroyed.
We can also think of the sample being generated by repeated trials of an
experiment. For example, if we wanted to sample ten light bulbs, we could
repeat the experiment ten times.
The outcomes of an experiment are called simple events. Simple events shall
be denoted by the capital letter E subscripted to associate E; with a particular
outcome (ith) of an experiment.
84
Definition 5.2
Simple event
A simple event is the outcome of an experiment.
Example 5.5
Suppose in our presidential election example that we randomly sample two
persons in the population of voters. A possible set of simple events
associated with this experiment is:
Vote By
The outcome tree represents a logical way to list the simple events of an
experiment. It is very practical if the number of events is not too large.
85
Example 5.6
Suppose three persons, A, B, and C, are interviewing for a job. Two will
be hired. The experiment is the selection of two of the three interviewed
individuals for the job. The simple events can be listed using an outcome
tree as illustrated in figure 5.3.
Notice in Example 5.6 that the six simple events listed specify not only the
two individuals selected, but also the order in which they are selected. That
is, E, and E, both result in the first two individuals, A and B, being selected.
If the order in which the two individuals are selected is not important, then
we need not distinguish between E1 and E2, E3 and E5. and E5, and E6. In this
case, a simpler set of outcomes would be:
E1*: A and B are selected,
E2*: A and C are selected,
E3*: B and C are selected.
As suggested above, it is often possible to define the outcomes and the
experimental simple events differently in the same experiment. To gain an
86
understanding of how to define the simple outcomes of an experiment,
consider the following example.
Example 5.7
Assume that Herman is to toss a "fair coin" twice. He informs you that
there are three possible outcomes (simple events) of this experiment:
E1*: No heads (two tails),
E2*: One head (one tail),
E3*: Two heads (no tails).
Herman tells you that the probability of any one of the three simple events
occurring is 1/3.He then wishes to wager with you on the outcome of one
trial of the experiment, say E2-one head occurring in two tosses of the coin.
Before deciding to accept a wager, you construct an outcome tree of a
single trial of the experiment.
From the outcome tree, it is clear that we may define another set of simple
events for this experiment:
E1: (H,H),
E2: (H.T),
E3: (T,H),
E4: (T,T).
If the coin is "fair," then the probability of each of the outcomes E1, E2, E3
and E4 in figure 5.4 occurring is 1/4.
In terms of the original three outcomes, E*1, E*2 and E*3 it is clear that
each does not have a 1/3, probability of occurring - the proper probabilities
are:
P( E1*)= 1/4 , [E1* = E1 (H,H)],
P(E2*)= 2/4 , [E2* = E2 or E3 (H,T) or (T,H)],
P(E3*)= 1/4 , [E3* = E4 (T,T)].
87
Figure 5.4 Outcome tree for a coin-tossing experiment
1 st coin toss 2 nd coin toss
Definition 5.3
Event
An event is a subset of outcomes of an experiment.
Notice that any simple event of an experiment is an event because it is a
single outcome of the experiment.
Definition 5.4
Null event
A null event is an event containing no simple events in an experiment. It
is denoted by : ф.
In Example 5.5, an example of a null event is "no persons vote for A
or for B." It is impossible for this event to happen, because there are
only two candidates, A and B, and the population consists of
individuals who will vote in the election. In this instance, the event
set is empty, for it does not contain any of the simple events in the
experiment.
The simple events of an experiment and events defined to be
collections of these simple events can be portrayed graphically by a
Venn diagram. The Venn diagram associated with the simple events
in Example 5.5 and the event D defined above is shown in
figure 5.5. Each simple event in a
88
Figure 5.5 A Venn diagram for the simple events in Example 5.5
Definition 5.5
Sample point
A sample point is a simple event in an experiment.
Definition 5.6
Sample space
A sample space is the set of all possible outcomes of an
experiment.
Definition 5.7
Event space
An event space is the collection of sample points
corresponding to an event defined over the sample space.
89
Figure 5.6 Venn diagram for the experiment shown in Figure 5.3
The Venn diagram showing the sample space for this experiment, the sample
points, and the event spaces F, G, and H is illustrated in figure 5.6.
In Example 5.5, where two persons are randomly selected from the voting
population, there are four possible outcomes of the experiment. In the cor-
responding sample space for this experiment, illustrated in Figure 5.5, we
defined the event D to be, "at least one of the two persons votes for
candidate A." What is the probability that the event D occurs in this
experiment? This question can be answered directly from the sample space
associated with the experiment if the probabilities of the simple events
occurring are known. Thus, to answer a probability question about an event
in an experiment, we first must assign probabilities to the simple events
90
associated with the experiment. We will denote by P(Ei) the probability
assigned to the simple event Ei. The assigned probabilities P(E1 ), P(E2), . . .
, P(Em), where there are M simple events in the experiment sample space,
must satisfy three probability axioms for experiments that have a finite
number of outcomes.
Axiom 1
The second axiom requires that the probabilities assigned to all the simple
events in the experiment must total one.
Axiom 3
Example 5.8
Suppose in Example 5.6 we assign the following probabilities to the simple
events in that experiment:
91
These probability assignments satisfy the above three axioms-each
probability assigned is a positive number between 0 and I inclusive, the
probabilities total one, and the probability that any member of a collection of
the simple events will occur is the sum of the probabilities of the members in
the collection.
Definition 5.8
Relative frequency definition of probability
If an event E is defined in an experiment, the experiment is repeated a very
large number of times, say N, and the event E is observed to occur in n of
these N experimental trials, then
… P(E) =n/ N . …
The ratio n/N represents the proportion of the time that event E occurs in
repeated experiments.
Rule5.1
… m.n rule …
Suppose that there are m distinguishable objects in one group and n dis-
92
tinguishable objects in another group. If one element is selected from each
group, it is possible to form m . n pairs of objects.
Example 5.9
Herman has decided to purchase a new hi-fi system with the money he
saved by buying a compact car instead of a large sedan. His hi-fi system will
be composed of a receiver, a pair of speakers, a record changer, and a tape
deck. In the store where he will make the purchase, there are 10 different
kinds of receivers, 5 kinds of speakers, 4 kinds of changers, and 8 kinds of
tape decks. How many systems can Herman choose from?
Solution:
Since he must select one element from each of the four groups, he can
choose from (10)(5)(4)(8) = 1600 possible systems.
93
from three is 3, while the number of permutations of two persons drawn
from three is 6.
In conjunction with the rules for computing the number of permutations and
combinations, the complete definitions follow.
Definition 5.9
Permutations
An ordered arrangement of r distinguishable objects is called a
permutation. The number of permutations of r objects taken from n
distinguishable objects will be denoted by Prn.
Definition 5.10
Combinations
A set of r distinguishable objects is called a combination. The number
of combinations of r objects taken from n distinguishable objects will be
denoted by Crn.
Rule 5.2
Permutations
Prn=n!/(n-r)!
Rule 5.3
Combinations
...Crn =n!/r!(n-r)!=(1/r!) Prn>>
94
or more different people in it. We are only concerned about the content of
each group, and therefore we want to determine the number of combinations
of three things taken from five.
Example 5.12
A club committee of three is to be formed by selecting three people from a
group of five. One of the selected people will be chosen a chairman of the
committee, another the secretary, and the third person will simply be a
"member" of the committee. How many different committees can be
formed?
Solution:
Suppose the three people (denoted by A, B, and C, respectively) have been
chosen from among the five. Once we have this combination of three people,
we must then assign them to the three positions: chairman, secretary, and
member.
We can view this process as ordering the three persons. That is, let the first
position in the ordering (or permutation) be the chairman, the second be the
secretary, and the third be the member. The possible permutations are: ABC,
ACB, BAC, BCA, CAB, and CBA. Each of these combinations is a different
committee, although each contains the same three people, A, B, and C. Thus,
we want to compute the number of permutations of r = 3 people chosen from
n = 5.
P35=5!/(5-3)!=60
In this case, there are 60 different committees which can be formed. Notice
that this is six times as many committees that could be formed in Example
5.11 because for each combination in Example 5.11, there are 6 permuta-
tions in Example 5.12.
95
Tutorial 5.1
l. Give the sample space of each of the following experiments in the form of
a Venn diagram. Be certain to define the simple events corresponding to
the sample points in the sample space.
a. A fair coin is tossed three times.
b. A coin and a die are tossed together.
c. Two fair dice are tossed, and the sum of the dots of the two faces
turning up is recorded.
d . A student receives his score on a multiple choice exam containing 20
questions.
e. A student receives his grade on an exam.
f. The number of telephone calls received at a switchboard during a live
minute interval is recorded.
g. A child is selected in a first grade class, and his or her weight (to the
nearest pound) is recorded.
2. A committee is composed of two men and two women. One member of
the committee is selected to serve as chairman and another is selected to
serve as secretary.
a. Define the simple events comprising the sample space of this
experiment. Identify which sample points in the sample space belong to
the following event spaces:
b. The younger man is selected as chairman: Event A.
c. A man is selected as chairman: Event B.
d. A woman is selected as secretary: Event C.
e. Events A and C occur: Event D.
f. Events B or C or both occur: Event E.
g. Show the live event spaces in a Venn diagram.
3. Two college job recruiting officers. Herman and Bill, come to the
University of Truth campus to fill positions in their organizations. Each
officer is attempting to fill three positions. Three students qualify for the
positions described. and each will be interviewed by the two officers. If a
sample point is defined as a specific number of students hired by Herman
or Bill, define the following events as specific collections of sample
points: Note: there are six jobs and three students three jobs will not be
filled. (Hint: The sample space is two-dimensional.)
a. The sample space S which consists of all outcomes defining the
number of students hired by each officer.
96
b. Event A: Herman hires at least two students.
5. A bowel contains 17 red balls, 10 black, .10 white balls and 20 blue balls.
If one of these is drawn at random, what the probabilities that it will
be: a) a red b) a white c) a blue d) red or white e) white or
blue f) neither white nor red.
97
5.5 Random Variable
Definition 5.11
Random variable
A random variable is a numerically valued function whose value is
determined by a random experiment.
98
may have the same grade point average.
Example 5.14
Suppose the random experiment is tossing a coin twice, and we define the
random variable, X = number of heads in the two tosses. figure 5.1
illustrates the correspondence between members of the experimental
outcomes and possible values of the random variable.
Notice that the random variable X is a function. To each member in the first
set (the simple events in the experimental sample space) there corresponds
one and only one member in the second set (the values of the random
variable). But each value of the random variable may correspond to one or
more simple events. Notice that we have written the random variable in
functional notation in figure 5.7 to emphasize its meaning.
Example 5.15
A production lot of 100 transistor radios contains 10 defectives. A retailer
decides to select two of the radios at random, and by extensive testing, to
determine whether they are defective. If neither radio is defective, he will
accept the lot. Define the random variable, X = number of defective radios
99
(X = 0, 1, or 2). Determine the probabilities that X assumes each of its
three possible values.
Solution:
In the experiment of selecting two radios at random, define the events:
A1: First radio is defective
A2: Second radio is defective
A3: First radio is not defective
A4: Second radio is not defective
The four simple events of the experiment are given in Table 5.1. The prob-
ability of each simple event occurring is determined by using the multiplica-
tive law For example,
P(A1 ∩ A2) = P(A1)P(A2/A1 )= (10/100)(9/99) = 0.0091
Notice that the two events are not independent; the outcome of the first
selection affects the chance of the second radio being defective. Notice that
the probabilities in Table 5.1 sum to one-we have specified the four mutually
exclusive and collectively exhaustive simple events of the experiment.
Table 5.1 Outcomes and probabilities for Example 5.13
Simple events Probability Values of X
E1:A1∩A2 (10/100)(9/99)= 0.0091 2
E2:A1∩A4 (10/100)(90/99)= 0.0909 1
E3:A3∩A2 (90/100)(10/99)=0.0909 1
E4:A3∩A4 (90/100)(89/99)= 0.8091 0
Since the simple events are mutually exclusive and collectively exhaustive,
P(X = 2) = 0.009 1, P(X = 1) = 0.0909 + 0.0909 = 0. 1818, and P(X = 0) =
0.8091. The values of the random variable X and their probabilities of
occurrence are given in Table 5.2. This Table represents the probability
distribution of the random variable X-a list of each value and its probability
of occurrence.
100
Table 5.2
Probability
distribution of X
X P(x)
0 0.8091
1 0.1818
2 0.0091
From the probability distribution the probability that the retailer will accept
the lot is 0.8091.
Table 5.3 Examples of discrete random variables
Definition 5.12
Discrete random variable
A random variable is discrete if its set of values is finite or countably
infinite in number.
A continuous random variable assumes values which occur on an interval or
intersection of intervals on the real line. The number of values that a
continuous random variable may assume is infinite. Examples of continuous
random variables include height, weight, and the diameter of ball bearings.
Although each of these variables is bounded (for example, the weight of
individuals is bounded between zero lbs. and, say 500 lbs.), the variable can
assume any of an infinite number of values between these bounds. Other
examples of continuous random variables are given in Table 5.4.
101
Table 5.4 Examples of continuous random variables
Definition of X Range of values of X
1. Diameter of 1/2 bolts produced in a machine shop X≥ 0
2. Weight of a student in your class X≥ 0
3. Amount of rainfall in inches recorded at a weather X≥0
station on a given day
4. Weight lost by a person weighing 300 lbs. on a X≤ 300
diet designed for weight loss (A negative value of X
indicates a weight gain.)
Definition 5.13
Continuous random variable
A random variable is continuous if it may assume all real number values in
an interval.
5.6 Probablity Mass Function
r
2. ∑ P(xi) = 1.
i=1
102
Figure 5.8 Three ways of presenting the distribution of the
discrete random variable in Example 5.15
Example 5.16
A committee of two persons is to be formed from four persons, three of
whom are female. The committee is formed anew at the end of each working
week. The members of the committee have the duty of arriving at the office
30 minutes early to prepare the morning coffee, turn on lights, plug in
machines and so on. In any given week, let X be the number of women
serving on the present committee. Form the probability distribution of the
random variable X.
Figure 5.9 Probability mass function for the random variable in Example
5.16
The average of the values of a random variable is called the expected value
of the random variable and is given for a discrete random variable:
103
Definition 5.15
Expected value of a random variable (discrete case)
Let X be a discrete random variable with a finite number of values denoted
by x1, x2; ..., xn. The mean or expected value of the random variable,
denoted by E(X), is given by
r
E(X) = ∑ xi . p(xi) .
i=1
Example 5.17
Table 5.7
xi P(x) xP(x)
0 0.8091 0.0000
1 0.1818 0.1818
2 0.0091 0.0182
Total 0.2000
Definition 5.16
Variance of a random variable (discrete case)
Let X be a discrete random variable with a finite number of values denoted
b) x1, x2, ..., xn. The variance of the random variable, denoted by V(X), is
given by
r
V(X)=∑ (xi- E(x)) 2 p(xi) , or
i=1
104
The first form of V(X) in definition 5.16 is a definitional form, and the
second is the "computing form." Giving two expressions for V(X) is similar
to giving two expressions for the population variance σ2 in chapter 4.
Example 5.18
Compute the variance of the random variable in Example 5.15.
Solution:
The easiest way to compute the variance of a discrete random variable is by
using a table similar to Table 5.8 below
Table 5.8 Partial computation of the
variance of the random variable in
Example 5.7
I xi P(xi) xi2 xi2P(xi)
1 0 0.8091 0 0.0000
2 1 0.1818 1 0.1818
3 2 0.0091 4 0.0364
0.2182
From Table 5.7, E(X) = 0.2.
Thus
3
V(x)=∑ xi2P(xi)- [E(x)] 2
i=1
105
2. The area under f(x) from X= a to X= b must be one.
106
x=d
P (c≤ X ≤d)=∑ P(x) , d ≥ c.
x=c
107
e3, . . . , in the interval (a, b). Since there is an infinite number of values (not
countable) between (a, b), we can see that the sum of the "probabilities" f(e)
+ f(e1 )+ f(e2) + f(e3) +••• will quickly exceed one, which means that any
particular number, say f(e2), can no longer be interpreted as a probability.
Indeed, a single f(ei) may exceed one by itself if the height of the curve at
the point X = e; is greater than one.
Another way to look at this is to write P(X = e) as P(e ≤ X≤ e) and use the
definition for the probability that X assumes a value between two points,
say c and d, as illustrated in figure 5.9. Since there is no area between e and
e, P(X = e) = P(e ≤ X≤ e) = 0.
Example 5.18
Consider the probability density function
1 0 ≤x≤ 1 ,
f(x)=
0 x <0 or x> 1.
Find :
1. P(0 ≤ x ≤ 1)
2. P(0.25 ≤ x ≤ 0.60)
3. P(x > 0.75)
108
Solution:
1
1. P(0 ≤ x ≤ 1)= 0∫ f(x) dx
1
= 0∫ 1 dx = 1.
0.6
2. P(0.25 ≤ x ≤ 0.60) = 0.25 ∫ f(x) dx
1
3. P(x > 0.75) = 0.75∫ f(x) dx
Definition 5.18
Expectation of a random variable (continuous case)
Let X be a continuous random variable defined over the interval (a, b) with
probability density function f(x). The mean or expected value of the
random variable, denoted by E(X), is given by
b
E(X) = a∫ x f(x) dx.
Definition 5.19
Variance of a random variable (continuous case)
Let X be a continuous random variable defined over the interval (a, b)
with probability density function f(x). The variance of the random
variable, denoted by V(X), is given by
b
V(X) = a∫ [x -E(x)] 2 f(x)dx, or
b
V (X) =a∫ x2 f(x) dx-[E(x)] 2 .
It is interesting to note the similarity in the computation of the expected
value and the variance of a discrete and a continuous random variable as
illustrated in Table 5.10. The analogy to summing in the discrete case is
integrating in the continuous case .
109
Table 5.10 Comparison of the expected value and variance formulas
for discrete and continuous random variables
Example 5.19
Let
0.5, 0 ≤ x ≤ 2 ,
f(x)= {
0 , otherwise.
Find:
a. E(x)
b.Var (x)
Solution :
2
E(x)= 0∫ 1/2 x dx = x2 /4 ]02 = 1 ,
2
Var(x)= 0∫ 1/2 x2 dx – (1)2 =2/6.
110
Tutorial 5.2
I . In a certain population of voters, it is known that 60 percent are
Democrats and 40 percent are Republicans. If a sample of' three voters is
extracted from this population, find the probability distribution of the
random variable, X = Number of Democrats in the sample.
3. A multiple choice exam consists of four questions, each of which has four
possible answers. If a student is forced to guess on all four questions, what is
the probability distribution of the random variable, X = Number of correct
guesses?
6. Suppose two dice are tossed. Let X be the sum of the dots on the top faces
of the two dice. Find the probability distribution of X.
111
X P(x) Find: a. The expected value
0 0.70 of X.
b. The variance of X.
c. The mode of X.
1 0.20 d. The range of X.
2 0.06
3 0.04
X P(x)
0........0.95
1........0.03
2........0.015
3 .......0.003
4 .......0.0015
5 .......0.0005
Find:
112
Problem 10 simulation experiment. Does X, the sample mean, provide a
good estimate of E(X)" Should X provide a good estimate of E(X)?
Why?
12. From the most recent national census, it is found that the number of
children (X) in American families follows the following probability
distribution:
Number of Proportion of
children, X families. P(x)
0 ................. 0.48
1.................. 0.20
2.................. 0.15
3.................. 0.08
4 ................. 0.05
5 ................. 0.03
6 ………….. 0.01
It is assumed that the proportion of families with more than six children
is negligible.
a. Find the expected value and standard deviation of X.
b. Form a stick diagram of this distribution. Is the distribution skewed?
c. find var(X).
f(x)=
0, otherwise.
113
Calculate :
a.p(-1<x<0) ,
b.p(X>1) ,
c.p(X=2),
d.E(X) ,
e.var(X) .
114
6
Binomial & Normal
distributions
115
6.1 Probability Function
Example 6.1
The first of the two tables which follow was easily obtained on the basis of
the assumption that each face of the die in question has a probability of 1/6
and the second was obtained by considering as equally likely the eight
possible outcomes HHH, HHT, HTH, THH, HTT, THT, TTH, and TTT of
three flips of a coin, where H stands for heads and T for tails:
Note that in each of these examples the sum of all the probabilities is 1.
Note also that since the values of probability functions are probabilities, they
must always be positive or zero, and cannot exceed 1.
Whenever possible, we try to express probability functions by means of
formulas which enable us to calculate the probabilities associated with the
various values of a random variable. With the usual functional notation.
116
we can thus write
f(x)=1/6 for x=1,2,...,6,
for the first of the above examples, where f(1) represents the probability of
rolling a 1, f(2) represents the probability of rolling a 2, and so on.
There are many applied problems in which we are interested in the proba-
bility that an event will take place x times in n "trials," or in other words, x
times out of n, while the probability that it will take place in any one trial is
some fixed number p and the trials are independent. We may thus be
interested in the probability of getting 24 responses to 80 mail question-
naires, the probability that in a sample of 50 voters 32 will favor Candidate
A, the probability that 3 of 10 laboratory mice react positively to a new drug,
and so on. Referring to the occurrence of any one of the individual events as
a "success", we are thus interested in the probability of getting x successes in
n trials. To handle problems of this kind, which incidentally include, we use
a special probability function, that of the binomial distribution.
If p denotes the probability of a success on any given trial, the probability
of getting x successes in n trials (and hence, x successes and n - x failures) in
some specific order is px (1 - p)n-x. There is one factor p for each success, one
factor 1 - p for each failure, and the x factors p and n - x factors 1 - p are all
multiplied together by virtue of the assumption that the n trials are
independent. Since this probability is the same for each point of the sample
space where there are x successes and n- x failures (it does not depend on the
order in which the successes and failures are obtained), the desired
probability for x successes in n trials in any order is obtained by multiplying
px (1-p)n-x by the number of points of the sample space (that is, individual
outcomes) where there are x successes and n - x failures. In other words, px
(1-p)n-x is multiplied by the number of ways in which the x successes can be
distributed among the n trials, namely, by n!/x!(n-x)! we have thus arrived at
the following result:
117
Definition 6.1
Binomial distribution
The probability of getting x successes in n independent trials is given
by
f(x)= (xn) px (1-p)n-x for n=0,1,2,...,n,
where p is the constant probability of a success for each individual
trial.
It is customary to say that the number of successes in n trials is a
random variable having the binomial probability distribution, or
simply the binomial distribution. The terms "probability distribution"
and "probability function" are often used interchangeably, although
some persons make the distinction that the term "probability
distribution" refers to all the probabilities associated with a random
variable, and not only those given directly by its probability function.
Incidentally, we refer to this distribution as the binomial distribution
because for x = 0, 1, 2, . . ., and n, the values of its probability function
are given by the successive terms of the binomial expansion of
((1 - p) + p)n.
Example 6.2
To illustrate the use of the above formula, let us first calculate the
probability of getting 5 heads and 7 tails in 12 flips of a balanced coin.
Substituting x = 5, n = 12, p=1/2, and (12!/5!.7!)=792
f(5)= 792 (1/2)5 (1-1/2)12-5= 99/512,
f(7)=120(4/5)7(1-4/5) 10-7
or approximately 0.20 .
118
Remark 6.1
Some of probability values of binomial distribution are given in Table
1(in appendix).
Example 6.3
where all the answers are rounded to three decimals. A histogram of this
distribution is shown in figure 6.1.
119
Figure 6.1 Histogram of binomial distribution with n=5 and p=0.60.
120
Figure 6.2 Some specific members of the binomial distributions where :
a- p = 0.2 , b- p = 0.5 , c- p = 0.8 ,
Remark 6.2 If X is a binomial random variables then its mean E(x) and
variance , V(x) are given by:
Example 6.4
Consider figure 6.2 find the mean and the variance for each p.m.f.
Solution:
i) E(x)=2(0.2) ,
Var(x)=2 (0.2)(0.8) ,
121
ii) E(x)=2(0.5) ,
Var(x)=2(0.5)(0.5) ,
iii) E(x)=2(0.8) ,
Var(x)=2(0.8)(0.2),
iv) E(x)=5(0.2) ,
Var(x)=5(0.2)(0.8) ,
v) E(x)=5(0.5) ,
Var(x)=5(0.5)(0.5) ,
vi) E(x)=5(0.8),
Var(x)=5(0.8)(0.2) .
distribution is that of a bell-it has a single mode and is symmetric about its
central value. The flexibility in using the normal distribution is due to the
fact that the curve may be centered over any number on the real line and that
it may be made flat or peaked to correspond to the amount of dispersion in
the values of a random variable. Many quantitative characteristics have
distributions similar in form to the normal distribution's bell shape.
122
Examples of random variables that have been modeled successfully by the
normal distribution are the height and the weight of people, the diameters of
bolts produced by a machine, the IQ of people, the life of batteries or light
bulbs, and so on. Typically, the type of experiment that produces a random
variable that can be successfully approximated by a normal random variable
is one in which the values of the random variable are produced by a measur-
ing process, where it is known that the measurements tend to cluster sym-
metrically about a central value. A random variable that is an average or a
sum of values of another random variable is, under very general conditions,
almost always distributed approximately as a normal random variable,
regardless of the form of the distribution of the random variable whose
values are summed or averaged. An example of such a random variable is
the mean grade point average of a randomly selected group of students. The
notion that a random variable that is an average is distributed as a normal
random variable is discussed in the next chapter with the central limit
theorem.
Unfortunately, if it is known that the distribution of a random variable is
symmetrically distributed with a single mode, the random variable may not
necessarily be a normal random variable. There are other distributions in
statistics that are unimodal and symmetric. However they also can often be
modeled successfully by the normal distribution. For a random variable to be
normally distributed, the mathematical expression delineating the form of
the bell must be of a specific type, as described in the following definition.
Definition 6.2
123
6.3.1 Mean and variance of the normal random variable
The mean and variance of the normal random variable may be determined
by performing the following integrations:
As might be suspected, these integrals are not easily evaluated. The results
of integration are rather simple, however, and are given in the following
theorem.
Remark 6.3
Notice that the mean depends only on the parameter µ, and that the variance
depends only on the parameter σ. Thus, the normal distribution may be
located over its central value on the real line independently of the amount of
dispersion σ2 specified for the distribution. Contrast this with the binomial
distribution (and others discussed thus far) in which the mean and the
variance both depend upon the parameters n and p, [E(X) = n.p ,
V(X) = n.p.(1-p) ] and hence are not independent of one another. This
property of the normal distribution adds immeasurably to its flexibility in
modeling the distributions of non-normal random variables.
124
We will now return to the problem of computing probabilities associated
with a normal random variable.
Definition 6.3
Standard normal distribution
125
Figure 6.4 Standard normal distribution
Example 6.5
Find the area under the standard normal distribution curve for each of the
intervals listed below.
a. Between Z = 0 and Z = 2.0
b. Between Z=-1.28 and Z = 0.0
c. Between Z=-0.58 and Z= 2.54
d. Between Z = 1.20 and Z = 2.4
e. Greater than Z = 2.87
Solution:
126
b. Since the normal distribution is symmetric, the area between - 1.28 and
0.0 is equal to the area between 0.0 and 1.28. Thus, proceed down the
leftmost column until 1.2 is reached. Select the ninth column marked 0.08.
The resulting number in the table is 0.3997.
127
c. We may determine this area in two parts: total area = (area between 0.0
and 2.54) + (area between -0.58 and 0.0). The area between 0.0 and 2.54 is
0.4945 from Table 1. The area between -0.58 and 0.0 is the same as the area
between 0.0 and 0.58, which is 0.2190. The answer is 0.4945 + 0.2190 =
0.7135.
d. We may determine this area by differencing the area from 0 to 2.44 and
from 0 to 1.20. The area between 0 and 2.44 is 0.4927, and the area
128
between 0 and 1.20 is 0.3849. Thus, the area between 1.20 and 2.44 is
0.4927 - 0.3849 = 0.1078.
e. Since the area between 0 and + ∞ is 0.5, we can determine the area from
2.87 to ∞ by subtracting the area from 0 to 2.87 (0.4979) from 0.5: 0.5000
-0.4979=0.0021.
Example6.6
Find the value of Z on the standard normal distribution axis for each of
the areas listed below.
a. The area between 0 and z is 0.3413 b. The area to the right of z is 0.8982
129
Solution:
a. Table 1 must be used in reverse. We look in the body of the table for
the area 0.3413. It appears in the row marked 1.0 and the column marked
0.0. Thus, the value of Z is 1.00.
b. Since the area given is greater than 0.5, we know that z must be less
than zero. The area between z and 0 is 0.8982 - 0.5000 = 0.3982.
Now assume that z is positive and find z so that the area between 0
and z is 0.3982. The area of 0.3982 does not appear in the tables; the
closest numbers are 0.3980 and 0.3997. The exact value of z could
be determined by interpolation, but we will use z= 1.27 since 0.3980
is closer to 0.3982 than is 0.3997. We must remember that z must be
to the left of zero (a negative number). Thus, z = -1.27.
130
Theorem 6.1
Standardization of a normal random variable
If X is a normal random variable, the mean of which is μ and the standard
deviation of which is σ, then
… Z=(x – μ )/σ ,…
is a standardized normal random variable with a mean of zero and a
standard deviation of one.
The following examples illustrate the use of Theorem 6.1, and more
generally, the applicability of the normal distribution model.
Example 6.7
Solution:
131
area can he determined by first standardizing x1 and x2:
132
133
Tutorial 6
2.
134
4. In each case check whether the given values can be looked upon as the
values of the probability function of a random variable which can take on
only the values 1, 2, 3, and 4, and explain your answers:
5. Use the formula for the binomial distribution to find the probability of
getting :
(a) exactly 3 heads in 8 flips of a balanced coin; (b) at
most 3 heads in 8 flips of a balanced coin; (c) exactly 1 one
in 3 rolls of a balanced die; (d) at most 1 one in 3 rolls of a
balanced die,(e) calculate E(x) and Var(x) .
135
7
Regression Analysis
7.1 Introduction
7.2 Relationships between variables
7.3 Simple linear regression model
7.4 Fitting of a simple linear regression model
Tutorial 7
136
7.1 Introduction
137
7.2 Relationships between Variables
Example 7.1
Suppose for every unit of a product sold, a company makes a profit of $3.
Let X = number of units sold, and Y =total profit. Then. Y = 3X. Illustrate
this linear relationship.
138
Solution:
Example 7.2
Solution:
These data are plotted in figure 7.2. Clearly, the taller a man is, the more
he weighs. But, the relationship is not a perfect one, as is evident in Figure
7.3. The line in figure 7.2 has been drawn to fit reasonably well through the
ten points, and the points are scattered about this line. The
139
Figure 7.2 Graph of the nonlinear functional relation Y= Xz
140
Table 7.1 Heights (X) and weights (Y) of ten randomly selected
adult males
Height (X) Weight (Y)
inches pounds
60 ...................... 110
65 ...................... 150
74 ...................... 200
70 ...................... 185
70 ..................... 170
66 ..................... 160
68 ...................... 180
72 ...................... 195
64 ...................... 135
71 ...................... 215
141
7.3 Simple Linear Regression Model
(2) the scattering of points about the "curve" that represents the rela-
tionship between X and Y.
These two assumptions are illustrated in figure 7.4 for the Example 7.2
data. The systematic way in which Y varies as a function of X is identified
as a straight line, the regression line of Y on X. The regression line goes
perfectly through the means of the conditional probability distributions of Y,
given a value of X. The data are collected by taking random samples from
the conditional probability distribution of Y for values of X. For example,
from Table 7.1, when X = 60 inches, Y was observed to be 110 pounds. This
particular value of Y represents a random sample of size one drawn from the
conditional probability distribution of Y when X = 60 inches.
142
Figure 7.4 Graphical form of the simple linear regression model
143
We could select a set of pressures (values of X) and then run the production
process at each pressure setting one or more times to produce observations
on Y. Alternatively, we could generate the sample data by taking a survey.
For example, we could randomly sample ten adult. males to determine their
heights and weights. But, the survey method has the disadvantage that we
must take whatever values of X(height) occur in the survey; the selection of
the set of values of X, the independent variable, is out of our control. We
might be so unfortunate, for instance, to find that all ten men in our survey
were 64 inches tall. We ideally want a spread of X values over the range of
interest and over which the regression line will be built.
1. For the ith trial, the expected value of the error component ei is zero
[E(ei) = 0], and the variance of the error component [V(ei)] is σ2 and is
constant for all values of i, i= 1, 2, ..., n.
2. The error components in any pair of trials, say the ith and the jth, are
uncorrelated.
3. The terms βo and β1 in the model are parameters whose values are
typically unknown and must, therefore, be estimated from sample data.
Further, X; is considered to be a known constant in the model.
144
2. By using the expectation operator rule given in Chapter 5, we get
… E[ Y]= β0 + β1 X. …
4. The observed value of Y in the ith trial is larger or smaller than μy/x
by the amount ei, the value of the error component in the ith trial.
5. By the second assumption , the outcome in any trial is not affected by or
does not itself affect the error term in any other trial .
145
7.4 Fitting of a Simple Linear Regression Model
Since β0 and β1, are generally not known in a regression problem, they must
be estimated from sample data taken on the dependent variable Y for a
number of values of the independent variable X. These pairs of sample
values are obtained either by experimentation or by survey. The data given
in Table 7.1 were determined by survey- 10 adult. males were selected at
random and their heights and weights were recorded.
Returning to the data given in Table 7.1, we will first produce a scatter plot
of these data. The scatter plot is given in figure 7.5. In figure 7.6, we have
superimposed two "fitted" lines through this scatter of points,
146
Figure 7.6 Scatter plot and fitted lines for data in Table 7.1
The specific values of β0 and β1, that minimize LS are the regression coef-
ficient estimates, denoted by b0 and b1 respectively.
147
Thus, the least squares criterion requires that we find a line, denoted by Ŷ =
b0+ b1 X, such that the sum of the squared vertical deviations between the
line and the scatter of points is minimized. In figure 7.6, the vertical
deviations corresponding to the line Ŷ = 170 + 0X, where b0 = 170 and
b1 = 0, are indicated. Obviously, the line Ŷ = -310.76 + 7.07X in figure
13.11, where b0 = -310.76 and b1 = 7.07, does much better in the least
squares sense because its vertical deviations from the scatter of points, when
squared and summed, will be less than the sum of squared deviations for the
line Ŷ = 170 + 0 X.
It turns out that the values of b0 and b1 , which minimize LS are solutions to
the following two simultaneous equations, which are referred to as the
normal equations:
Solving the normal equations for b0 and b1, produces the point estimators of
β0 and β1 respectively. The resulting formulas for b0 and b1,
are given below:
Example 7.3
Let us now fit a simple Linear regression model to the data on heights and
148
weights given in Table 7.1
Solution:
The computations for determining b0 and b1, are given in Table 7.2. The
format in this table provides a convenient worksheet for finding the neces-
sary components in the formulas for b0 and b1
Thus, the fitted regression line is Ŷ=- 310.76 + 7.07X and this is the best
fitting line based upon the least squares criterion.
149
Tutorial 7
5. Discuss the assumptions made in using simple linear regression about the
distributions of the conditional mean values.
7. Plot each of the following sets of data as a scatter diagram. Which curves
seem to fit the data best?
Determine the regression equation for each set of data.
150
151
Table 1 Binomial Distribution Probability
152
Table 2 Standard normal distribution areas
153
References
1.William G. Cochran.
Sampling Techniques.
3.John E. Freund
Modern Elementary Statistics .
154