CH 1
CH 1
CH 1
1
Examples:
Classification of students in DMU Campus according to their Department
The number of fe/male students in this class.
Inferential Statistics:
This type of statistics is concerned with drawing statistically valid conclusions about the
characteristics of the population (large group) based on information obtained from a
sample (small group). That is, this part of statistics is concerned with the generalizing the
results of a sample (small groups) to the entire population (large group) from which the
sample is drawn.
It is the part of statistics that is generalizing from sample to population using
probabilities, performing hypothesis testing, determining relationships between variables,
and making predictions.
Example: Of 50 randomly selected people in the town of Gondar, 10 people had the last name
Abebe. An example of inferential statistics is the following statement: "about 20% of
all people living in Ethiopia have the last name Abebe."
2
4. Analysis of data: Analysis of data involves extraction of relevant information from the
collected data using some mathematical and statistical tools.In other words, it involves
extracting relevant information from the data (like mean, median, mode, range, variance),
mainly through the use of elementary mathematical operation.
5. Interpretation of data: This stage involves drawing a valid conclusion from the analyzed
data. That is interpretation of data involves making inferences (drawing conclusions) based
on the analysis of data.
1.1.3 Definition of some Statistical terms
Sampling: - The process of selecting a sample from the population is called sampling.
Population: A population is a totality of things, objects, peoples, etc about which
information is being taken.
Sample: A sample is a subset or part of a population selected to draw conclusions about
the population.
Census survey: -It is the process of examining the entire population. It is the total count
of the population.
Parameter:- It is a descriptive measure (value) computed from the population. It is the
population measurement used to describe the population. Example: population mean and
population standard deviation
Statistic: - It is a measure used to describe the sample. It is a value computed from the
sample.
Sampling frame:-A list of people, items or units from which the sample is taken.
Data:- Data as a collection of related facts and figures from which conclusions may be
drawn.
Variable: A certain characteristic which changes from object to object and time to time.
Sample size: The number of elements or observation to be included in the sample.
1.1.4 Applications, Uses and Limitations of Statistics
Application: Statistics is applied in almost all fields of human endeavor. It has become the
scientific framework for including education, agriculture, business and economics, industry and
health.
3
Uses of Statistics
Statistics deals with only aggregate of facts and not with individual data items
Statistics deals with only with quantitative data (information)
Statistical data are true only on average (approximately)
Statistics can be easily misused and therefore should be used be experts
4
2. Quantitative variables: assume numeric values. These variables are numeric in nature. E.g.:
Height, Family size
Quantitative data can be further classified as discrete or continuous.
o Discrete variable: takes whole number values and consists of distinct recognizable
individual elements that can be counted. It is a variable that assumes a finite or countable
number of possible values. These values are obtained by counting (0, 1, 2, ,). E.g.:
Family size, Number of children in a family, number of cars at the traffic light
o Continuous variable: takes any value including decimals. Such a variable can
theoretically assume an infinite number of possible values. These values are obtained by
measuring.
E.g.: Height, Weight, Time, and Temperature
Generally the values of a variable can be obtained either by counting for discrete variables, by
measuring for continuous variables or by making categories for qualitative variables.
Ex: Classify each of the following as Qualitative and Quantitative and if it is quantitative classify
as Discrete and Continuous.
a. Color of automobiles in a dealers show room.
b. Number of seats in a movie theater.
c. Classification of patients based on nursing care needed (complete, partial or seafarer)
d. Number of tomatoes on each plant on a field.
e. Weight of newly born babies.
Scales of Measurements/Levels of Measurements
According to scale of measurement data can be classified as: Nominal, Ordinal, Interval and
Ratio data.
Nominal Scales of variables are those qualitative variables which show category of
individuals. They reflect classification in to categories (name of groups) where there is no
particular order or qualitative difference to the labels. Numbers may be assigned to the
variables simply for coding purposes. It is not possible to compare individual basing on the
numbers assigned to them. The only mathematical operation permissible on these variables
is counting. These variables
Have mutually exclusive (non-overlapping) and exhaustive categories.
No ranking or order between (among) the values of the variable.
5
Example: Gender, Religion, ID No, Ethnicity, Color
Ordinal Scales of variables are also those qualitative variables whose values can be ordered
and ranked. Ranking and counting are the only mathematical operations to be done on the
values of the variables. But there is no precise difference between the values (categories) of
the variable. Eg: Academic qualifications (B.Sc., M.Sc., Ph.D), Strength (very weak, week,
strong, very strong), Health status (very sick, sick, cured)
Interval Scales of variables are those quantitative variables when the value of the variables
is zero it does not show absence of the characteristics i.e. there is no true zero. Zero indicates
low than empty. There is a precise difference between the units of measurement (levels)
Eg: temperature, 00c does not mean there is no temperature but to say it is too cold.
Ratio Scales of variables are those quantitative variables when the values of the variables are
zero it shows absence of the characteristics. Zero indicates absence of the characteristics. All
mathematical operations are allowed to be operated on the values of the variables.
Eg: Height, Weight, Income, Amount of yield, Expenditure, Consumption.
1.2 Methods of data collection and presentation
1.2.1 Methods of data collection
We have already explained what it means by statistical data. Numerical facts or measurements
obtained in the course of enquiry in to a phenomenon, marked by uncertainty, constitute
statistical data. The statistical data may be already available or may have to be collected by an
investigator or an agency. Data termed primary when the reference is to data collected for the
first time by the investigator and is termed secondary when the data are taken from records or
data already available.
Based on the source, data can be classified into two: Primary Data and Secondary Data.
Method of primary data collection
In primary data collection, you collect the data yourself using methods such as interviews,
observations, laboratory experiments and questionnaires. The key point here is that the data you
collect is unique to you and your research and, until you publish, no one else has access to it.
There are many methods of collecting primary data and the main methods include:
Questionnaire: It is a popular means of collecting data, but is difficult to design and often
require many rewrites before an acceptable questionnaire is produced.
6
Interviewing: is a technique that is primarily used to gain an understanding of the underlying
reasons and motivations for peoples attitudes, preferences or behavior. Interviews can be
undertaken on a personal one-to-one basis or in a group. They can be conducted at work, at
home, in the street or in a shopping center, or some other agreed location.
Observation: It involves recording the behavioral patterns of people, objects and events in a
systematic manner.
Diaries: A diary is a way of gathering information about the way individuals spend their time on
professional activities. They are not about records of engagements or personal journals of
thought! Diaries can record either quantitative or qualitative data, and in management research
can provide information about work patterns and activities.
Laboratory experiment: Conducting laboratory experiments on fields of chemical, biological
sciences and so on.
Methods of secondary data collection
Secondary data analysis can be literally defined as second-hand analysis and is the analysis of
data or information that was either gathered by someone else (e.g., researchers, institutions, other
NGOs, etc.) or for some other purpose than the one currently being considered, or often a
combination of the two. Some of the sources of secondary data are government document,
official statistics, technical report, scholarly journals, trade journals, review articles, reference
books, research institutes, universities, hospitals, libraries, library search engines and
computerized data base.
Having collected and edited the data, the next important step is to present it. That is to present
the data in a comprehensible, condensed and suitable form that helps in order to draw
interpretation from it. Data presentation is a statistical procedure of arranging and putting data in
a form of tables, graphs, charts and diagrams. The need for proper presentation arises because of
the fact that statistical data in their raw form are not easy to understand.
7
Tabular presentation
A statistical table is an orderly and systematic presentation of numerical data in rows and
columns. Rows (stubs) are horizontal and columns (captions) are vertical arrangements. The use
of tables for organizing data involves grouping the data into mutually exclusive categories of the
variables and counting the number of occurrences (frequency) to each category.
Definitions:
Raw data: When data are collected in original form, they are called raw data.
Frequency: is the number of times a certain value or class of values that fall in to a specific class
of the distribution.
Frequency distribution: is the organization of raw data in table form using classes and
frequencies. A frequency distribution (or frequency table) lists classes (or categories) of values,
along with frequencies. The table consists of two columns: one contains the list of possible
8
values and/or classes, and the other contains the number of times each of those values or classes
occurred in the data.
1) Categorical frequency Distribution: is a frequency distribution in which the data is only nominal
or ordinal.
Example: A social worker collected the following data on marital status for 25 persons.
(M=married, S=single, W=widowed, D=divorced)
M S D W D
S S M M M
W D S M M
W D D S S
S W W D D
Q. Construct a frequency distribution.
Solution:
Since the data are categorical, discrete classes can be used. There are four types of marital status M,
S, D, and W. These types will be used as class for the distribution. We follow the following
procedure to construct the frequency distribution.
9
M
Step 2: Tally the data and place the result in column (2).
Step 3: Count the tally and place the result in column (3).
Step 4: Find the percentages of values in each class by using;
f
% *100 . Where f= frequency of the class, n=total number of value.
n
Percentages are not normally a part of frequency distribution but they can be added since they are
used in certain types diagrammatic such as pie charts.
Combing the entire steps one can construct the following frequency distribution.
10
Example: The following data represent the mark of 20 students.
80 76 90 85 80 65 60 63 74 75
70 60 62 70 85 76 70 70 80 85
Mark 60 62 63 65 70 74 75 76 80 85 90
Frequency 2 1 1 1 4 1 2 1 3 3 1
Each individual value is presented separately, that is why it is named ungrouped frequency
distribution.
3) Grouped frequency Distribution: When the range of the data is large, the data must be grouped
in to classes that are more than one unit in width. Grouped frequency distribution is a frequency
distribution where several numbers are grouped into one class.
Definitions
Class limits: Separates one class in a grouped frequency distribution from another. The limits
could actually appear in the data and have gaps between the upper limits of one class and
lower limit of the next.
Units of measurement (U or d): the distance between two possible consecutive measures or
the gap between two successive classes. It is usually taken as 1, 0.1, 0.01, 0.001, -----.
Class boundaries: Separates one class in a grouped frequency distribution from another. The
boundaries have one more decimal places than the raw data and therefore do not appear in
the data. There is no gap between the upper boundary of one class and lower boundary of the
next class. The lower class boundary is found by subtracting U/2 from the corresponding
11
lower class limit and the upper class boundary is found by adding U/2 to the corresponding
upper class limit.
Class width: the difference between the upper and lower class boundaries of any class. It is
also the difference between the lower limits of any two consecutive classes or the difference
between any two consecutive class marks.
Class mark (Mid points):it is the average of the lower and upper class limits or the average
of upper and lower class boundary.
Cumulative frequency above: it is the total frequency of all values greater than or equal to
the lower class boundary of a given class.
Cumulative frequency below: it is the total frequency of all values less than or equal to the
upper class boundary of a given class.
Relative cumulative frequency (rcf): it is the cumulative frequency divided by the total
frequency.
12
Steps for constructing Grouped frequency Distribution
1. Find the largest and smallest values and compute the Range(R) = Maximum Minimum
2. Select the number of classes desired, usually between 5 and 20 or use Sturges rule
k 1 3.332 log n where k is number of classes desired and n is total number of
observation.
R
3. Find the class width dividing the range by the number of classes w . It is also the
k
difference between the upper and lower class boundaries of the class, that is, w = UCB
LCB.
4. Pick a suitable starting point less than or equal to the minimum value. The starting point
is called the lower limit of the first class. Continue to add the class width to this lower
limit to get the rest of the lower limits.
5. To find the upper limit of the first class, subtract U from the lower limit of the second
class. Then continue to add the class width to this upper limit to find the rest of the upper
limits.
6. Find the boundaries by subtracting U/2 units from the lower limits and adding U/2 units
from the upper limits.
7. Tally the data.
8. Find the frequencies.
9. Find the cumulative frequencies. Depending on what you're trying to accomplish, it may
not be necessary to find the cumulative frequencies.
10. If necessary, find the relative frequencies and/or relative cumulative frequencies
Example:Q.Construct a frequency distribution for the following data.
11 29 6 33 14 31 22 27 19 20
18 17 22 38 23 21 26 34 39 27
Solutions:
Step 1: Find the highest and the lowest value H=39, L=6 and find the range; R=H-L=39-6=33.
Step 2: Find the number of classes using Sturges formula; which is given by k 1 3.332 log n
=1+3.32log (20) =5.32=6(rounding up)
R
Step 3: Find the class width; w =33/6=5.5=6 (rounding up)
k
Step 4: Select the starting point, let it be the minimum observation.
6, 12, 18, 24, 30, 36 are the lower class limits.
Step 5: Find the upper class limit; e.g. the first upper class=12-U=12-1=11
11, 17, 23, 29, 35, 41 are the upper class limits.
So combining step 5 and step 6, one can construct the following classes.
13
Class limits 6 11 12 17 18 23 24 29 30 35 36 41
Then continue adding w on both boundaries to obtain the rest boundaries. By doing so, one can
obtain the following classes.
Class 5.5 11.5 11.5 17.5 17.5 23.5 23.5 29.5 29.5 35.5 35.5 41.5
boundary
Step 7 & 8: Tally the data& write the numeric values for the tallies in the frequency column.
Step 9 & 10: Find cumulative, relative and/or relative cumulative frequencies.
The complete frequency distribution is given below.
Class limit Class boundary Class Tally Fre Cf (less Cf (more rf. rcf (less
Mark q. than type) than type) than type
14
Diagrams are appropriate for presenting discrete data. The choice of the particular form among
the different possibilities will depend on personal choices and/or the type of the data.
The most commonly used diagrammatic presentation for discrete as well as qualitative data are:
Pie charts, Bar charts and Pictogram.
Pie chart:A pie chart is a circle that is divided in to sections or wedges according to the
percentage of frequencies in each category of the distribution. A circular chart is showing the
distribution of values of a variable (absolute or relative). Pie chart is a diagrammatic depiction of
data as slices of a pie. The frequency determines the size of the slice.
The proportion of the category can express either by percentages or by angles. That is degree of
central angle of a category = (amount of the category / total amount)*360 0. The proportion of a
category = (frequency of a category / total frequency)* 100%.
Solutions:
Step 1: Find the percentage and/ or degree for each class.
Step 2: Using a protractor and compass, graph each section and write its name and the
corresponding percentage.
Men 2500 25 90
Women 2000 20 72
Boys 1500 15 54
15
Boys
15% Men
25%
Girls Women
40% 20%
Bar Charts: Bar charts are used to represent and compare the frequency distribution of discrete
variables and attributes or categorical series. When we represent data using bar chart, all the bars must
have equal width and the distance between bars must be equal, but length varying in proportion to the
size(frequency) to the item. The height of the bars represents frequencies and the base represents the
categories. Bars can be drawn either vertically or horizontally. There are different types of bar charts.
The most common being:
i. Simple Bar Chart: It is a one-dimensional chart in which the bar represents the whole of the
magnitude. The height or length of each bar indicates the size (frequency) of the figure represented.
They are thick lines (narrow rectangles) having the same breadth. The magnitude of a quantity is
represented by the height /length of the bar.
Example:The following data represent sale by product, 1957- 1959 of a given company for three
products A, B, C.
Product Sales ($) Sales($) Sales($)
In 1957 In 1958 In 1959 Total
A 12 14 18 44
B 24 21 18 63
C 24 35 54 113
Total 58 70 90218
Solution:
16
Sales by Product Type
120 113
100
80
63
Sales
60
44
40
20
0
A B C
Product
Sales by Year
100 90
80 70
Sales
58
60
s
40
20
0
1957 1958 1959
Year
ii. Component Bar chart: When there is a desire to show how a total (or aggregate) is divided in
to its component parts, we use component bar chart. The bars represent total value of a variable
with each total broken in to its component parts and different colors or designs are used for
identifications. This is done by dividing the bars into parts representing the components and
shading them accordingly.
Example: Draw a component bar chart to represent sales in dollar by product type from 1957 to
1959.
Solutions:
17
iii. Multiple Bar charts: In this type of chart the component figures are shown as separate bars adjoining
each other. The height of each bar represents the actual value of the component figure. It depicts
distributional pattern of more than one variable and comparisons of each component are desired.
Example: Draw a component bar chart to represent the sales by product from 1957 to 1959.
Solutions:
Class limit Class boundary Class Tally Freq Cf (less Cf (more rf. rcf (less
Mark . than type) than type) than type
Histogram
19
Frequency Polygon:A line graph of class frequencies against midpoints of the classes. The
frequency is placed along the vertical axis and classes mid points are placed along the horizontal
axis and these points are connected with lines.
Example: Draw a frequency polygon for the above data.
Frequency Polygon
8
7 7
Freq.
Frequency
6
5
4 4
3 3
2 2 2 2
1
0
8.5 14.5 20.5 26.5 32.5 38.5
Class Midpoint
Ogive (cumulative frequency polygon): A line graph that represents the cumulative frequencies
(less than or more than type) plotted against upper or lower class boundaries respectively. That is
class boundaries are plotted along the horizontal axis and the corresponding cumulative
frequencies are plotted along the vertical axis. The points are joined by a free hand curve.
To construct an Ogive curve:
Compute the less than and more than cumulative frequency of the distribution.
Prepare a graph with the cumulative frequency on the vertical axis and the true class
limits (class boundaries) of the interval scaled along the X-axis (horizontal axis).
Mark the intersection points of the class boundaries of the cumulative frequencies with a
dot.
Connect the intersection points using a line (curve).
20