STAT Note1-4
STAT Note1-4
STAT Note1-4
March 20211
Lecture -one
1. Introduction:
1.1. Definition and Classification of Statistics
Definition of statistics
Statistics is defined as the science of collecting, organizing, presenting, analyzing and
interpreting numerical data to make decision on the bases of such analysis.
Data: is the measurement or observation (values) for a variable (factor)
- A collection of data values forms a data set.
- Each value in the data set is called a data value or datum.
Classification of Statistics
There are two broad branches of statistics:-
i.) Descriptive statistics:-Statistical
statistics:-Statistical method that deals with organizing or summarizing a given set
of data in to a meaningful form.
form. Like in newspapers, magazines, reports and other publications
come from data that has been summarized and presented in a form that is easy for the reader to
understand.
Here there is no generalization or conclusion about the population.
Consists of collection, organization, and presentation of data.
E.g. frequency distribution, measure of central tendency (such as mean, median) measure of
dispersion (like range, Sd, V, etc...)
ii.) Inferential statistics:
statistics: - Is the process of drawing conclusion about a population based on the
information obtained from the sample.
sample. Because of time, cost and other constraints data are
collected from only small portion of the group (or sample).
This make estimates and
test claims about the characteristics of a population based on the sample
Used to describe, infer, estimate, approximate the characteristics of the target population
Examples : As a result of recent reduction in oil production by oil producing nations , we
can expect the price of gasoline to be double up in the next year.
year.
As a result of recent survey of public opinion, most Americans are in favor of building additional
nuclear power plant.
1.2. Stages in Statistical Investigation
The area of statistics points out the following five stages.
1. Data collection
2. Organization of data
Adjusted by Sibex A. LECTURE NOTES Page 1
RVU Introduction To Statistics For Economic
March 20211
3. Presentation of data
4. Analysis of data
5. Interpretation
Stage 1. Data collection
Is the process of gathering information or data about the variable of interest for our specific
purpose?
Constitutes the first step in a statistical investigation.
At most care must be exercised.
The data may be available from existing published or unpublished sources . I.e. data may
obtain either primarily or secondarily.
Stage 2.
2. Organization of data
Is the process of---.
Editing:
Editing: is the process of checking and connecting data for omission, inconsistencies, irrelevant
answer and wrong computation in the collected data.
Classification: is the task of grouping the collected & edited data in to different similar
categories based on some criteria
Tabulation: is to put classified data in the form of table.
Arranging or classification of data in the suitable order makes the information easier for
presentation.
Stage 3.
3. Presentation of data
Presented in the form of charts and diagram.
Large data will be presented in tables in a very summarized and condensed manner.
The main purpose of data presentation is to facilitate statistical analysis.
Stage 4. Analysis of data
This is the stage where we critically study the data to draw conclusions about the
population parameter.
The purpose of data analysis is to dig out information useful for decision making.
making.
Analysis usually involves highly complex and sophisticated mathematical techniques.
Such as the calculations of averages, the computation of measures of dispersion,
regression and correlation analysis are covered.
Stage 5.
5. Interprétation
This is the stage where draw valid conclusions and decision making from the results obtained
through data analysis.
Adjusted by Sibex A. LECTURE NOTES Page 2
RVU Introduction To Statistics For Economic
March 20211
Is a difficult task and necessitates a high degree of skill and experience. Because if data that
have been analyzed are not properly interpreted, the whole purpose of the investigation may be
defected and fallacious conclusion be drawn. So that great care is needed when making
interpretation.
1.3. Definitions of Some Basic Terms
a) Population:- is the totality of causes (items) under consideration in a given investigation or
research.
Ex. the total number of students in WU, workers in a factory, etc…
Population can be finite or infinite.
Finite population:
population: is the population that can be finite (can be limited in size). E.g. No
No of
workers in farmland hotel.
Infinite population:
population: is the population that is unrestricted in nature/not limited.
E.g. the Production of bacteria (the observation is can’t be even in theory).
b) Sample: - is a sub group or part of the population selected by some methods (sampling
techniques) in order to estimate the characteristics of the population parameter. E.g. 25 staff of
WU out of 350 staffs.
c) Elementary unit:- is the specific person, business, product, and so on, with some
characteristics to be measured or categorized (information is recorded). E.g. The weight of
particular person in the class.
d) Sampling unit: - is the finite number of distinct, non-overlapping & identifiable unit obtained
by dividing the population for the purpose of sample selection.
e) Variable: - is the factor or characteristics that can take on different possible value or outcome.
Example income, height, weight, sex, salary, etc….
A variable can be qualitative or quantitative
A Qualitative variable: - is the variable that can be expressed in categorical ways.
Are generally described by words or letters. I.e. it cannot be expressed in terms of
numbers. For instance, hair color might be black, dark brown, light brow, Sex, marital
status, Religion, Region, political affiliation, etc.…
A Quantitative variable: - is the variable that can be measured in numerical ways
(measurable quantity).
Are the results of counting or measuring attributes of a population.
May be either discrete or continuous.
Observations obtained through such process are called quantitative data.
Adjusted by Sibex A. LECTURE NOTES Page 3
RVU Introduction To Statistics For Economic
March 20211
c) Statistics and research: there is hardly any advanced research going on without the use of
statistics in on form or another. Statistics are used extensively in medical, pharmaceutical and
agricultural research.
Usefulness of statistics:-
• Statistics condenses and summarizes complex data. The original set of data (raw data) is
normally voluminous and disorganized unless it is summarized and expressed in few numerical
values.
• Statistics facilitates comparison of data. Measures obtained from d/t set of data can be
compared to draw conclusion about those sets. Statistical values such as averages, percentages,
ratios, etc, are the tools that can be used for the purpose of comparing sets of data.
• Statistics helps in predicting future trends. Statistics is extremely useful for analyzing the past
and present data and predicting some future trends.
• Statistics influences the policies of government. Statistical study results in the areas of taxation,
on unemployment rate, on the performance of every sort of military equipment, etc, may
convince a government to review its policies and plans with the view to meet national needs and
aspirations.
• Statistical methods are very helpful in formulating and testing hypothesis and to develop new
theories.
Weaknesses of statistics:-
It doesn’t deal with single (individual) values . Statistics deals only with aggregate values. But
in some cases single individual is highly important to consider in some situations. Example, the
sun, a deriver of bus, president, etc.
It can’t deal with qualitative characteristics.
characteristics. It only deals with data which can be quantified.
Example, not deal with marital status (married, single, divorced, widowed) but it deal with
number of married, number of single, number of divorced.
It conclusions are not universally true . Statistical conclusions are true only under certain
condition or true only on average. The conclusions drawn from the analysis of the sample may,
perhaps, differ from the conclusions that would be drawn from the entire population. For this
reason, statistics is not an exact science.
It interpretations require a high degree of skill and understanding of the subject.
subject. It requires
extensive training to read and interpret statistics in its proper context. It may lead to wrong
conclusions if inexperienced people try to interpret statistical; results.
It can be misused. Sometimes statistical figures can be misleading unless they are carefully
interpreted.
Example, the report of head of the minister about Ethio-Somalia terrorist attack mission
dismissed terrorists 25% at first day, 50% at second day, 75% at third day. However, we doubt
about the mechanisms how the mission is measured and quantified. This leads miss use of
statistical figures.
1.5. Scales of Measurement
The various measurement scales results from the facts that measurement may be carried out
under different sets of rules. Generally, there are four types of measurements of data. They are
(from lowest to highest level):.
i.) Nominal Scale:-
Is characterized by data that consist of names, labels, or categories only.
only.
Is data cannot be arranged in an ordering scheme.
In these arithmetic operations of addition, subtraction, multiplication, and division are not
performed.
Data that is measured using a nominal scale is qualitative
E.g.
E.g. Religion: Christianity, Islam, Hinduism, etc, Sex:
Sex: Male, Female, Eye color:
color: brown, black,
etc.
ii.) Ordinal Scale:-
Data that is measured using an ordinal scale is similar to nominal scale data but there is a
big difference. The ordinal scale data can be ordered.
For example: list of the top five national parks in the African. The top five national parks in
the African can be ranked from one to five but we cannot measure differences between the
data.
Like the nominal scale data, ordinal scale data cannot be used in calculations.
Whenever observations are not only different from category to category, but can be ranked
according to some criterion. The variables deal with their relative difference rather than
with quantitative differences.
Ordinal data are data which can have meaningful inequalities. The inequality signs < or >
may assume any meaning like ‘stronger, softer, weaker, better than’, etc.
E.g.: Patients may be characterized as unimproved, improved & much improved.
E.g.: Individuals may be classified according to socio-economic as low, medium & high. It is
usually impossible to infer that difference between member of one category and the next adjacent
category.
iii.) Interval Scale:
Interval scale data can be measured though the data does not have a starting point.
It is not only possible to order measurements, but also the distance between any two
measurements is known but not meaningful quotients.
There is no true zero point but arbitrary zero point. Interval data are the types of
information in which an increase from one level to the next always reflects the same
increase in the characteristic. Possible to add or subtract interval data but they may not be
multiplied or divided.
E.g.: Temperature of zero degrees does not indicate lack of heat. The two common temperature
scales; Celsius (C) and Fahrenheit (F). We can see that the same difference exists between 10 oC
(50oF) and 20oC (68OF) as between 25oc (77oF) and 35oc (95oF) i.e , the measurement scale is
composed of equal-sized interval. But we cannot say that a temperature of 20 oc is twice as hot as
a temperature of 10oc. because the zero point is arbitrary.
iv.) Ratio Scale:-
It is like interval scale data, but it has a 0 point and ratios can be calculated.
Characterized by the fact that equality of ratios as well as equality of intervals may be
determined.
For example, four multiple choice statistics final exam scores are 80, 68, 20 and 92 (out of a
possible 100 points). The exams are machine-graded.
The data can be put in order from lowest to highest: 20, 68, 80, 92.
CHAPTER - 2
Adjusted by Sibex A. LECTURE NOTES Page 7
RVU Introduction To Statistics For Economic
March 20211
Under this method, a list of questions related to the survey is prepared and sent to the various
respondents by hand, post, website, email etc .However; this method cannot be used if the
respondent is illiterate.
The following are the major points that we need to take into account while preparing the
questionnaire. The number of questions should be small. Naturally respondents are not
comfortable with lengthy questionnaires. Lengthy questionnaire usually bore respondents. If a
lengthy questionnaire is unavoidable, it should preferably be divided in to two or more parts.
Source of Data
The statistical data may be classified under two categories depending up on the sources.
1. Primary data: - Data collected by the investigator himself for the purpose of a specific inquiry
or study.
Mostly generated by surveys conducted by individuals or research institutions.
It is more reliable & accurate since the investigator can extract the correct information by
removing doubts, if any, in the minds of the respondents regarding certain questions.
2. Secondary data:
data: - When an investigator uses data, which have already been collected by
others, such data are called secondary data.
Such data are primary data for the agency that collected them, and become secondary for
someone else who uses these data for his own purposes.
For examples;
examples; are books, journals, reports, etc.
When our source is secondary data check that:
The type and objective of the situations.
The purpose for which the data are collected and compatible with the present problem.
The nature and classification of data is appropriate to our problem.
There are no biases and misreporting in the published data.
Note: Data which are primary for one may be secondary for the other.
2.2. Methods of Data Presentation
Having collected and edited the data, the next important step is to organize it. That is to present it
in a readily comprehensible condensed form that aids in order to draw inferences from it. It is
also necessary that the like be separated from the unlike ones.
The presentation of data is broadly classified in to the following two categories:
Tabular presentation
Diagrammatic and Graphic presentation.
ii. Ungrouped Frequency Distribution (UFD) : UFD us a table of all potential raw score values
each times each actually could possibly occur in the data along with the number of times each
actually could occur.
UFD is often constructed for small set of data or data of discrete variable.
Constructing ungrouped frequency distribution:
First find the smallest and largest raw score in the collected data.
Arrange the data in order of magnitude and count the frequency.
To facilitate counting one may include a column of tallies.
Example 2.2:
2.2: The following data represent the number of days of sick leave taken by each of 50
workers of a company over the last 6 weeks.
2 0 0 5 8 3 4 1 0 0 7 1
7 1 5 4 0 4 0 1 8 9 7 0
1 7 2 5 5 4 3 3 0 0 2 5
1 3 0 2 4 5 0 5 7 5 1 1
0 2
i. Construct ungrouped frequency distribution
ii. How many workers had at least 1 day of sick leave?
iii. How many workers had between 2 and 6 days of sick leave?
Solution:
i. Since this data set contains only a relatively small number of distinct or different values, it is
convenient to represent it in a frequency table which presents each distinct value along with its
frequency of occurrence.
ii. Since 12 of the 50 workers had no days of sick leave, the answer is 50-12=38
iii. The answer is the sum of the frequencies for values 3, 4 and 5 that is 4+5+8=17.
iv. Grouped Frequency Distribution (GFD). Is a frequency distribution having several values
grouped in to one class.
*Usually used when the range of the data is large.
Grouped frequency distribution must be inclusive i.e. classes must not be overlap one to the
other
(a) Inclusive
(b) Exclusive
a) In inclusive type of frequency distribution, the upper limit of one class does not coincide with
the lower limit of the next class.
b) In exclusive type of frequency distribution, the upper limit of one class coincides with the
lower limit of the next class
Definition of some basic terms
Grouped frequency distribution: is a FD when several numbers are grouped into one class.
Class limits (CL): It separate one class from another. The limits could actually appear in the
data and have gaps between the upper limits of one class and the lower limit of the next class.
Unit of measure (U): This is the possible difference between successive values. E.g. 1, 0.1,
0.01, 0.001, etc
Class boundaries: Separate one class in a grouped frequency distribution from the other. The
boundary has one more decimal place than the raw data. There is no gap between the upper
boundaries of one class and the lower boundaries of the succeeding class. Lower class boundary
is found by subtracting half of the unit of measure from the lower class limit and upper class
boundary is found by adding half unit measure to the upper class limit.
Class width (W): The difference between the upper and lower boundaries of any consecutive
class. The class width is also the difference between the lower limit or upper limits of two
consecutive class.
Class mark (Mid point): It is found by adding the lower and upper class limit (boundaries) &
divided the sum by two.
Cumulative frequency: It is the number of observation less than or greater than the upper
class boundary of class.
CF (Less than type): it is the number of values less than the upper class boundary of a given
class.
CF (Greater than type): it is the number of values greater than the lower class boundary of a
given class.
Relative frequency (Rf ):The
):The frequency divided by the total frequency. This gives the present
of values falling in that class.
Fri. = fi/n= fi/ ∑if
Relative cumulative frequency (RCf): The running total of the relative frequencies or the
cumulative frequency divided by the total frequency gives the present of the values which are less
than the upper class boundary or the reverse.
CRfi = Cfi/n=
Cfi/∑fi
2.3. Cumulative Frequency Distribution: -is a frequency distribution that displays the sum of
frequencies of consecutive classes of above or below a given class.
There are two types of cumulative frequency: -
a) Less than cumulative frequency (Lcf): it used interest focuses on the total number of
observation below a specified value.
b) More than cumulative frequency (Mcf): it used when frequency interest focuses on the total
no of observation above a specified value. E.g.
Relative cumulative frequency (RCf): The running total of the relative frequencies or the
cumulative frequency divided by the total frequency gives the present of the values which are
less than e upper class boundary or the reverse.
CR/fi = Cfi/n= Cfi/∑fi
Frequency
class year
Total 25
Fig 3 Histogram
iv. Frequency Polygon:
Polygon: Is the line graph that displays the data using a line that connects points
plotted for the frequencies of the class mark.
i.e. the frequencies represent the height of the class mark.
* A frequency polygon can also be super imposed on a histogram.
Frequency
Class boundaries
Frequency polygon
i.e. super imposed
on a histogram.
Chapter - Three
last subscript) denotes the number of observations in the data and is the ith observation. Then
that is
= +
3. = n.a + b
4. =
5.
3.3. Properties/characteristics of measures of central tendency
A good average should be:
be:
Rigidly defined (unique).
Based on all observation under investigation.
Easily understood.
Simple to compute.
Suitable for further mathematical treatment.
Little affected by fluctuations of sampling.
Not highly affected by extreme values.
3.4. Types of measures of central tendency
Measures of Central Tendency: give us information about the location of the center of the
distribution of data values. A single value that describes the characteristics of the entire mass of
data is called measures of central tendency. The following are types of central:
These are
The Means
Arithmetic Mean
Weighted Arithmetic Mean
Combined mean
Geometric Mean
Harmonic Mean
The Median
The Mode or modal value
3.3.1. The Mean
a) Arithmetic mean:- is defined as the sum of the measurements of the items divided by the
= =
= =
= = = =8
= = = =7
= =
Example 3.4
Calculate the arithmetic mean of the pulse rates (beats per minute) of eleven students:
60 60 71 68 71 72 71 76 72 80 80
= = = = 71
In this case there are two 60’s, one 68, three 71’s, two 72’s, one 76, and two 80’s. The number of
times each number occurs is called its frequency and the frequency is usually denoted by f. The
information in the sentence above can be written in a table, as follows.
Value, xi 60 68 71 72 76 80 Total
Frequency, fi 2 1 3 2 1 2 11
xi fi 120 68 213 144 76 160 781
The formula for the arithmetic mean for data of this type is
= =
= = 71.
The mean pulse rate (beats per minute) of the eleven students is 71.
Example 3.5
The following frequency table gives the height (in inches) of 100 students in a college.
Class boundary 60-62 62-64 64-66 66-68 68-70 70-72 Total
Frequency (fi) 5 18 42 20 8 7 100
Calculate the mean
Solution:
The formula to be used for the mean is as follows:
Let us calculate these values and make a table for these values for the sake of convenience.
Class boundary Frequency Mid-Point (
(fi)
)
60 - 62 5 61 305
62 – 64 18 63 1134
64 – 66 42 65 2730
66 – 68 20 67 1340
68 – 70 8 69 552
70 – 72 7 71 497
= = = = 65.58
The sum of squares of deviations from the mean is the least. That is, is minimum when
.
If the mean of , is , then
Example 3.6
A student’s final mark in Mathematics, Physics, Chemistry and Biology are respectively A, B, D
and C. If the respective credits received for these courses are 4, 4, 3 and 2, determine the
approximate average mark the student has got for the course.
Solution
We use a weighted arithmetic mean, weight associated with each course being taken as the
number of credits received for the corresponding course.
4 3 1 2 Total
4 4 3 2 13
16 12 3 4 35
= = = 2.69
c) Combined mean:-When
mean:-When a set of observations is divided into k groups and is the mean of
observations of group k , then the combined mean ,denoted by , of all observations taken
together is given by
This is a special case of the weighted mean. In this case the sample sizes are the weights.
Example 3.7
In the Previous year there were two sections taking Statistics course. At the end of the semester,
the two sections got average marks of 70 & 78. There were 45 and 50 students in each section
respectively. Find the mean mark for the entire students.
Solution:
= = = = 74.21
= antilog ( )
Example 3.8
Find the G. M of (a) 3 and 12 (b) 2, 4 and 8
Solution: a) ; b) GM= =4
Geometric mean for discrete data arranged in FD:- When the numbers , occur
= antilog ( )
Example 3.9
Compute the geometric mean of the following values: 3, 3, 4, 4, 4, 5, 6 and 6.
Solution
Values 3 4 5 6
Frequency 2 3 1 2
G.M. = = 4.236
e) Harmonic Mean
It is a suitable measure of central tendency when the data pertains to speed, rate and time. The
harmonic of n values is defined as n divided by the sum of their reciprocal.
Example 3.10: A car travels 25 miles at 25 mph, 25 miles at 50 mph, and 25 miles at 75 mph.
Find the harmonic mean of the three velocities.
Solution
= = 40.9.
Harmonic mean for discrete data arranged in FD:- If the data is arranged in the form of
frequency distribution
, where
Harmonic mean for continuous grouped FD: Whenever the frequency distribution are
grouped continuous, class marks of the class intervals are considered as and the above
H.M. = where
GM HM
3.3.2. Median
The median is as its name indicates the middle most value in the arrangement which divides the
data into two equal parts. It is obtained by arranging the data in an increasing or decreasing order
b/ = if n is even
Example 3.11
Find the median for the following data.
a/ -5 15 10 5 0 2 1 4 6 and 8
b/ 5 2 2 3 1 8 4
Solution;
a) The data in ascending order is given by:
-5 0 1 2 4 5 6 8 10 15
n=10 n is even. The two middle values are 5th and 6th observations. So the median is,
= = .
Where: L= the lower class boundary of the median class; w = the class width of the median
class;
= the frequency of the median class; and
the cum. freq. corresponding to the class preceding the median class. That is, the sums of the
frequencies of all classes lower than the median class. Where the median class is the class which
contains the (n/2)th observation whether n is odd or even, since the items have already lost their
originality once they are grouped in to continuous classes.
Example 3.12: Water percentage in the body of species of Fish is given below. Calculate the
median.
C.I 15-24 25-34 35-44 45-54 55-64 Total
Freq. 7 17 16 6 4 50
Solution: Construct the less than cumulative frequency distribution, then:
Since n = 50, 50/2 = 25, and the smallest CF greater than or equal to 25 is 40; thus, the median
class is the third class. And for this class, L = 34.5, w = 10, =16,
CF = 24. Then applying the formula, we get:
=34.5+ (25-24)*10/16 = 35.1.
Merits of median
• It is less affected by extreme values.
• Median can be calculated even in case of open-ended intervals.
• It can be computed for ratio, interval, and ordinal level of data.
Demerits of median
• Its value is not determined by each & every observation.
• It is not a good representative of the data if the number of items (data) is small.
• The arrangement of items in order of magnitude is sometimes very tedious process if the
number of items is very large.
3.3.3. The Mode or Modal value
The mode or the modal value is the value with the highest frequency and denoted by . A data
set may not have a mode or may have more than one mode. A distribution is called a bimodal
distribution if it has two data values that appear with the greatest frequency. If a distribution has
more than two modes, then the distribution is multimodal. If a distribution has no modes, then
the distribution is non-modal.
i. Mode of individual series:
series: The mode or the modal value of individual series (raw data) is
simply obtained by locating the observation with the maximum frequency.
Example 3.13
Consider the following data:
c. 10 40 30 20 50 60. No Mode.
Note that in some samples there may be more than one mode or there may not be a mode. The
mode is not a suitable measure of central tendency in these cases. We use the mode as a measure
of central tendency if we require a measure that takes on one of the sample values. The mode can
be used for variables that are measured on a category (nominal) scale. e.g. the most popular
computer type.
ii. Mode for Grouped Continuous Frequency Distribution
For grouped data, the mode is found by the following formula. In such cases, one can only
determine the modal class easily i.e. the class with the highest frequency. After locating this
class, the mode is interpolated using:
A measure is a resistant measure if its value is not affected by an outlier or an extreme data
value.
The mean is not a resistant measure of central tendency because it is not resistant to the
influence of the extreme data values or outliers.
The median is resistant to the influence of extreme data values or outliers and its value does
not respond strongly to the changes of a few extreme data values regardless of how large the
change may be.
The mode has an advantage over both the mean and the median when the data is categorical
since it is not possible to calculate the mean or median for this type of data. Also, the mode
usually indicates the location within a large distribution where the data values are concentrated.
However, the mode cannot always be calculated because if a distribution has all different data
values, then the distribution is non-modal.
In the case of symmetrical distribution; mean, median and mode coincide. That is,
mean=median = mode. However, for a moderately asymmetrical (non symmetrical) distribution,
mean and mode lie on the two ends and median lies between them and they have the following
important empirical relationship, which is
(Mean – Mode) = 3(Mean - Median).
Example 3.15
In a moderately asymmetrical distribution, the mean and the mode are 30 and 42 respectively.
What is the median of the distribution?
Solution:
Median = (2mean + Mode)/2 = (2*30 + 42)/3 = 34
Hence the median of the distribution is 34.
Which of the Three Measures is the Best?
At this stage, one may ask as to which of these three measure of central tendency is the best.
There is no simple answer to this question. It is because these three measures are based upon
different concepts. The arithmetic mean is the sum of the values divided by the total number of
observations in the series. The median is the value of the middle observations tend to
concentrate, As such; the use of a particular measure will largely depend on the purpose of the
study and the nature of the data. For example, when we are interested in knowing the consumers’
preferences for different brands of television sets or kinds of advertising, the choice should go in
favor of mode. The use of mean and median would not be proper. However, the median can
sometimes be used in the case of qualitative data when such data can be arranged in an ascending
or descending order. Let us take another example. Suppose we invite applications for a certain
vacancy in our University. A large number of candidates apply for that post. We are now
interested to know as to which age or age group has the largest concentration of applicants. Her,
obviously the mode will be the most appropriate choice. The arithmetic mean may not be
appropriate as it may be influenced by some extreme values.
CHAPTER - FOUR
MEASURE OF DISPERSION (VARIATION)
(VARIATION)
Defn: dispersion or variation is any value obtained from the difference of the numbers.
4.1. Objectives of Measuring variation or Dispersion
To judge the reliability of measure of central tendency,
To compare two or more groups of numbers in terms of their variability, and
To further statistical analysis.
4.2. Absolute or Relative Measures
Absolute Measures of Dispersion: The measures of dispersion which are expressed in terms
of the original unit of a series are termed as absolute measures. Such measures are not suitable
for comparing the variability of two distributions which are expressed in different units of
measurement and different average size.
Relative Measures of Dispersion:
Dispersion: Relative measures of dispersions are a ratio or percentage of
a measure of absolute dispersion to an appropriate measure of central tendency and are thus pure
numbers independent of the units of measurement. For comparing the variability of two
distributions (even if they are measured in the same unit), we compute the relative measure of
dispersion instead of absolute measures of dispersion.
4.3. Types of Measure of Dispersion
There are various measure of dispersions, out of which the most commonly used are:
1. Range (R) and Relative Range (RR)
2. Variance (s2), Standard Deviation (s) and Coefficient of Variation (CV).
3. The Standard Score
I) Range (R) and Relative Range (RR):
a) Range (R): is the largest score minus the smallest score. It is a quick and dirty measure of
variability, although when a test is given back to students they very often wish to know the range
of scores. i.e.
Range for raw data:
Range for grouped data: If data are given in the shape of continuous frequency distribution,
the range is computed as:
II) Variance (s2), Standard Deviation (s) and Coefficient of Variation (CV)
Standard Deviation
There is a problem with variances.
Recall that the deviations were squared. That means the units were also squared.
To get the units back the same as the original data values, the square root must be taken.
Examples: find the variances and standard deviations of the following sample data 5,17,12,10.
The data is given in the form of frequency distribution.
Solutions: =11
=11
Xi 5 10 12 17 total
(Xi- )2 36 1 1 36 74
Class Frequency
40-44 7
45-49 10
50-54 22
55-59 15
60-64 12
65-69 6
70-74 3
= 55
Xi(C.M) 42 47 52 57 62 67 72 total
CV= * 100%
Exa. An analysis of the monthly wages paid (in birr) to workers in two firms A and B belonging
to the same industry gives the following results:
value Firm A Firm B
Mean wage 52.5 47.5
Median wage 50.5 45.5
variance 100 121
In which firm A or B is there greater variability in individual wage?
Solutions: calculate coefficient of variation for both firms.
Since C.VA < C.VB, in firm B there is greater variability in individual wages
III) Standard Score (Z-Scores)
A standard score for sample value in a data set is obtained by the mean of the data set from the
value and dividing the result by the standard deviation of the data set. Basically, the standard
score (z-score) tells us how many standard deviations a specific value is above or below the
mean value of the data set. That is, the z-score is the number of standard deviations the data
value falls above (positive z-score) or below (negative z-score) the mean for the data set.
Z-score computed from the population
Adjusted by Sibex A. LECTURE NOTES Page 35
RVU Introduction To Statistics For Economic
March 20211
Example 4.11: What is the Z-score for the value of 14 in the following sample data set?
3 8 6 14 4 12 7 10
Solution:
= 8, S = 3.8173 thus, Z =
The data value of 14 is located 1.57 standard deviations above the mean 8 because the z-
score is positive.
Example 4.12: Suppose that a student scored 66 in Statistics and 80 in Biology. The score of the
summary of the courses is given below.
Course Average score Standard deviation of the score
Statistics 51 12
Biology 72 16
In which course did the student scored better as compared to his classmates?
Solution:
can conclude that the student has scored better in Statistics course relative to his classmates than
in Economics course.
course.