Edu 821 Statistical Methods I PDF
Edu 821 Statistical Methods I PDF
Edu 821 Statistical Methods I PDF
SCHOOL OF EDUCATION
COURSE DEVELOPMENT
Course Developer
Professor M. I. Shuaibu
Federal Ministry of Education
Education Sector Analysis
Education Annex
Abuja
Unit Writers
Professor M. I. Shuaibu
Federal Ministry of Education
Education Sector Analysis
Education Annex
Abuja
Dr. U. S. A. Osuji
National Open University of Nigeria
Lagos
Programme Leader
Course Coordinator
Dr. Osuji, U.S.A
School of Education
NOUN HQ-Lagos
NOUN 2
EDU 701
STATISTICAL METHODS I
COURSE GUIDE
NOUN 3
EDU 701
STATISTICAL METHODS I
CONTENTS
DESCRIPTIVE STATISTICS
NOUN 4
EDU 701
STATISTICAL METHODS I
The Mode
NOUN 5
EDU 701
STATISTICAL METHODS I
INTRODUCTION
Edu 701 statistical methods I is a one semester course for all post graduate students
pursuing masters degree in education at the National Open University of Nigeria
(NOUN). It can serve as a reference material for students in other schools or doing
research in other fields. It is a three credit course which is compulsory for all
education students at the masters level.
The course will consist of 13 units which include the nature of data/scientific
observations, basic concepts in statistics, statistical notations/shorthand,
measurement scales, organisation and presentation of data, graphical representation
of data, measures of central tendency, measures of variability/dispersion, shapes of
curves, some measures of association and agreement and the standard scores. The
material has been developed to suit learners in Nigeria by using examples from the
local environment.
The course is designed for people who have earned a professional qualification in
education. Most of the teachers would have been teaching for some time or would
have been aspiring for leadership positions, in their various places of work. Others
would have been in management/leadership positions as HODs, principals,
supervisors etc where they will be expected to:
The overall aim of Edu 701, statistical methods I is to introduce you to descriptive
statistics. During the course you will learn the meaning and types of statistics; types
of data, scales and variables; organisation. Presentation and representation of data
using tables, graphs charts etc. you will also learn how to describe the data using
different methods or measures such as the central tendency, variability, association
etc. You will in addition learn the types of corves and their properties; and how to
transform raw scores into standard scores.
Statistics as a course is very necessary for you because there is nothing you will do
in education which does not require your knowledge of it. Indeed there is no human
NOUN 6
EDU 701
STATISTICAL METHODS I
COURSE AIMS
The main aim of the course is to introduce you to descriptive statistics and give you
the understanding of how to present and represent your data and to describe your
observations in data form.
COURSE OBJECTIVES
Each unit of the course has specific objectives which are included at the beginning of
the unit. You are required to read them before you start working through the
unit.
You should always refer to them as you work through and at the end of the unit to
check your progress. However the objectives of the course are as follows:
NOUN 7
EDU 701
STATISTICAL METHODS I
COURSE MATERIALS
These include: course guide, course material, text books, assignment files etc. in
addition you should have a calculator, a mathematical set, graph sheets and statistical
tables.
Study units
There are 13 study units in this course they are:
The course is designed to last for 15 weeks or a semester. It implies that each unit
should be studied in one week. The reference books are listed after each unit.
Statistics books are available in the markets and bookshops.
Assessment
Assessment in this course shall be made up of two parts. These are the
1. Tutor marked assignments TMAs which have a total of 40%. At least six
TMA as should be submitted out of which the best four will be used for
assessment and grading.
2. Examination: the written examination which shall last for three hours at the
end of the course will have 60%
Both the TMAs and the examination must be passed at a minimum percentage before
you can be successful in the course. The examination will consist of questions which
reflect the types of self-assessment exercises and the TMA questions
NOUN 8
EDU 701
STATISTICAL METHODS I
You should use the time between finishing the last unit and sitting for the
examinations for your revisions. Information from all parts of the course will be
examined.
ASSESSMENT MARKS
Assignment 1-6 At least six assignments to be submitted, out
of which four will be used. At 10% each =
40% of course marks
Final examination 60% of overall course marks
Total 100% of course marks
Open and distance learning is not the same thing as face to face learning. Therefore,
there is no lecturer in ODL. The self-learning material has replaced the lecture. This
means that you can study the materials at your own time and place. The self-
learning material can do every thing the lecturer can do for you, if you follow it
carefully.
Each of the units follows the same pattern. The format ranges from:
i. Introduction
ii. Objectives
iii. The main body
iv. Conclusion
v. Summary
vi. TMA
vii. References
To work through without any hitch, follow the under-listed practical strategies.
NOUN 9
EDU 701
STATISTICAL METHODS I
j. On completing the last unit, review the course and prepare yourself for
the final examination.
If you run into trouble contact your tutorial facilitator at the study centre or contact
the course co-ordinator of the course at the Headquarters of National Open
University of Nigeria, Victoria Island, Lagos. Note that both the facilitator and the
co-ordinator are there to help you. Do not hesitate to call and ask them to help.
Tutorials are provided in support of this course. You will be notified of the dates,
time and locations of these tutorials, together with the names and phone numbers of
your tutorial facilitator and the course-co-ordinator as soon as you are registered with
National Open University of Nigeria at your study centre. Do not hesitate to contact
your tutor or the course co-ordinator if you do not understand any part of the study
units or the assigned readings; or you have difficulty with the self-tests or exercises;
or if you have a question or problem with an assignment, tutors comments on an
assignment or with the grading of an assignment.
You should try to attend the tutorials regularly. This is the only way you can have
face to face contact or interaction with the facilitator who is there to answer your
questions.
SUMMARY
Edu 701: Statistical Method I is one of the two courses you will work through
in your programme.
Edu 701 Is designed to teach you descriptive statistics upon the completion of
this course you will be able to answer such questions as:
What is statistics
What are the types of data dealt with in statistics
What are the purposes of statistics
What are the types of statistics
Why do we use samples instead of population
What are the measurement scales
What are the variables in statistics
How do you organise data in statistics
How do you present data
How do you re-present data
How does bar chart differ from histogram
What are the measures of central tendency
What are the measures of dispersion
NOUN
10
EDU 701
STATISTICAL METHODS I
If you have completed the course successfully you would have been equipped with
the basic knowledge of descriptive statistics. This means that you can answer even
more questions that are given above. You are also equipped to do some arithmetic
which you can do easily with your calculator.
We wish you success with the course, we hope you will find it very interesting.
Enjoy your programme at National Open University Nigeria. We wish you every
success in your future.
NOUN 11
EDU 701
STATISTICAL METHODS I
Unit 1
1.0 INTRODUCTION
Welcome to, perhaps, your first course in statistics and statistical methods.
This first unit introduces you to raw material you would be working with,
data, how they are derived and how they are used in our daily lives.
The
relationship between data and statistics and the various types of statistics are
then introduced. The approach to the statistical methods course is then
presented.
2.0 OBJECTIVES
The raw material of all statistical works is data whether you are looking at the
enrolment figures in our school system, the number of participants from
various states in a national conference; the number of successful candidates
in a public examination or the salaries of teachers, you are presented with a
large amount of information, often in the form of numerical figures.
NOUN 12
EDU 701
STATISTICAL METHODS I
Let us quickly look at an example of how data is used in our daily lives.
A restaurant has prepared three different menu for lunch for a group of
people attending a two day conference.
Day I There was jollof rice eba and fried yam and plantain. As part of the
preparation of the organisers for the group lunch, participants are
required to write their names and indicate in one of three columns
their preferred lunch. By the time full attendance was taken, the
following observation was made:
Exercise 1.1
Identify two instances in which you have come across data today.
So the use of data is pervasive, we use data every day in our life, although we
do not always describe them as data, note that for brevity, we can use A, B
and C to represent the variety that was available.
A B C
31 7 18
NOUN 13
EDU 701
STATISTICAL METHODS I
The supervisor can now make further decisions, on the basis of this
information, such as how many attendants must be engaged to provide quick
services, and how many chairs must be engaged to be provided, the manner
in which we have handled this information so far is referred to as statistics.
Now let us consider further our example in table 1, suppose the following
day, the same menu of three varieties A, B and C are offered and the choices
this time around are as follows:
A B C Total
No of persons
21 17 16 54
Now the supervisor has an opportunity to compare the two sets of data, Day 1
and Day 2. He can do so by again looking at the columns of totals. He can
also compare the numbers eating the same food in the two days and make a
number of inferences. For instance, he might infer that the two participants
who failed to turn up for lunch are dissatisfied with the services. He may
want to verify this by seeking further evidence. He could also infer that eba
has gained popularity and jollof rice has lost patronage. Again, the data
provides him a number of options.
NOUN 14
EDU 701
STATISTICAL METHODS I
In addition, there is hardly any discipline today even in education, the arts
and social sciences that does not require some level of statistics for its
understanding. Research reports in most disciplines are enriched by
statistics.
Lastly, one requirement for higher degree in most disciplines is that you carry
out and report your own independent research.
School managers are constantly faced with situations in which they have to
make inferences from observations, for instance, the proprietor of a private
school took a weeks attendance and found the following:
Week days
Class Monday Tuesday Wednesday Thursday Friday
No in
Class
Primary one 38 39 38 36 31 40
Primary two 35 34 33 32 30 35
Primary
45 41 40 43 40 45
three
Primary 36 35 38 39 18 40
four
Primary
41 42 42 40 36 42
five
Primary six 28 26 29 21 27 31
If she makes similar observations over several weeks, the proprietor can
deduce from these that theres hardly full attendance in a school day, she may
also deduce that attendance is least toward the week end.
You do not always know what type of research you will carry out, whether it
is a documentary analysis, a historical analysis or an experimental study.
NOUN 15
EDU 701
STATISTICAL METHODS I
You will however find the knowledge of statistics most valuable in your
readings, conception design and analysis of your work.
Exercise 1.2
You will find some of them esoteric and extremely difficult to comprehend.
You should not worry, for we shall be coming across some of them and
indeed learn to handle them in this course.
The purposes listed in section 1.3 give rise to three types of statistics, namely:
You will notice that this course is described as statistical methods as opposed
to statistics which we have been discussing indeed, there is a complementary
statistical methods II course which you may be required to take during the
second semester. They are so called in order to draw attention to the fact that
the emphasis in this course is not to make you statisticians. Rather, you are
expected to grasp the method statisticians use to process information, the
circumstances in which one approach is used in preference to another and the
limitation accompanying any so-called hand/statistical facts. You will be
expected to pay special attention to the procedure, rather than the theory of
statistics.
NOUN 16
EDU 701
STATISTICAL METHODS I
4.0 CONCLUSION
In this unit you should have learnt the concept of data, as the foundation of
all statistical analyses and how data are derived from repeated events,
information or scientific observation. You should also have learnt to define
statistics, the purpose of statistics and types of statistics.
The statistical methods course is concerned with how to handle data derived
in educational contexts clearly, several of such contexts exist in education
and other social sectors. The statistical methods course is applicable in a
wide variety of disciplines. These contexts and related concepts shall be
examined in the following unit.
5.0 SUMMARY
NOUN 17
EDU 701
STATISTICAL METHODS I
Unit 2
1.0 INTRODUCTION
2.0 OBJECTIVES
We have already asserted that data are the fundamental raw material of
statistics and data are usually numerical information about objects, events,
etc. We cited the example of a school proprietor who monitored and took a
weeks attendance in the private school, clearly the proprietor can come to
some conclusion only for that week, from the observations or data generated
in table 3. If she wanted to know the attendances throughout the school term
or school year, she would have to take attendance every week during the
whole term of school year, as the case may be. Thus, the one week
attendance is only part of a large body of possible information. When a
statistician deals with whole rather than part information about an object,
events or phenomenon he is dealing with a population, population is all
possible objects, beings, events incidences with the same characteristics that
are the focus of the observer.
We are all conversant with the national census. What is the principal focus or
objective of a national census exercise? The population of a country, the
population of Nigeria in 1991 was .., according to the National
Population Commission ( ). This means that as of the time the national
headcount was done, there were . human beings (males & females,
foreigners, rural and urban dwellers) in the political entity called Nigeria.
Note that in this case we were looking only at the number of human beings,
not animals or houses. Note also that human beings can still be categorized
NOUN 18
EDU 701
STATISTICAL METHODS I
as males and females, nationals and expatriates, rural and urban dwellers etc.
native language speakers Hausa, Igbo, Efik and other language speakers.
The point would be in error because although the Nupe culture is Nigerian,
not all Nigerian cultures are to be found with the Nupes. If a point is to be
made about the Nigerian culture, then we must select or choose a sample that
would be representative of Nigeria i.e. one in which all the characteristics of
Nigerian cultures are present.
Exercise 2.1
NOUN 19
EDU 701
STATISTICAL METHODS I
NOUN 20
EDU 701
STATISTICAL METHODS I
population are represented in a sample, the more closely decision made from
the sample approximate the true position of the population.
Suppose that soon after you saw the girls, the school bell rings for a break
and all the other girls are wearing white blouse over brown shirt, you will
have to admit your error of judgement and change your conclusion.
It is extremely useful to ensure that samples are drawn in such ways that
would represent the characteristics associated with the population of interest.
NOUN 21
EDU 701
STATISTICAL METHODS I
3.3 MEASUREMENT
Thus any object, event or phenomenon with a scientific study must have
some measurable property/quality. It is these measurements that give rise to
the data which the statistician uses for his/her analysis. Thus, performance in
a test is measured (scored) against the criteria defined in the marking scheme.
ERRORS OF MEASUREMENT
Tailors who are regularly involved in measurements would tell you that no
two persons who measure an individuals trouser length using the same tape
would give exactly the same values, if they really want to be accurate.
Similarly, two teachers marking an essay are unlikely to give the same score
to the writer even when the teachers use the same marking scheme.
Besides, if an individual measures the same thing over and over again using
the same instrument, he/she would begin to notice slight differences if you
take the record of your blood pressure under normal situation for many days
you are bound to find slight variation. Yet you could still be described
as
normal.
NOUN 22
EDU 701
STATISTICAL METHODS I
Exercise 2.2
ii. List all possible sources of the variation which you may have
observed.
5.0 SUMMARY
In this unit, you should have learnt the concepts of population and sample.
While population refers to totality of members of a well defined group
(objects, beings things, events, measurement, phenomena) a sample is a part
there from. A sample is said to be representative of a population if the
sample has all the characteristics of its parent population. When numerical
properties of a population are known or derived, we say we have its
parameter. When numerical properties of a sample are known or derived, we
say we have statistics, as opposed to the discipline of statistics spelt with a
capital letter S.
Parameters and statistics are derived from some kind of measurement, which
in turn gives rise to data. The process of measuring depends on whats being
measured. We also learnt that measurement has inbuilt errors. Although
errors can be minimized, they cannot be entirely eliminated from
measurement. Errors are to be recognised.
NOUN 23
EDU 701
STATISTICAL METHODS I
UNIT 3
STATISTICAL NOTATIONS/SHORTHAND
1.0 INTRODUCTION
In the two earlier units, you learnt a number of related concepts. You learnt
about how scientific observations are transformed into data for the purpose of
statistical analysis. You also learnt the concepts of population and sample
and the indication that are used to describe the measurements made
therefrom.
Since you will be dealing with a lot of numerical data, some of them may
have to be handled the way mathematicians handle them. In others, you will
quite often have to add, subtract, multiply and divide large number with
notations/symbols or shorthand which statisticians use to describe the
operations that you may need to carry out with the numbers. We shall restrict
ourselves to operation you may have to carry out with samples and their
estimates of statistics.
2.0 OBJECTIVES
NOUN
24
EDU 701
STATISTICAL METHODS I
If you have a third set of scores, it will be designated with letter Z and so on.
Indeed any Roman capital letter may be used to denote a set of scores from
one variable.
So in summary,
Likewise,
X is a subset of X
Y is a subset of Y
Z is a subset of Z
If you have more than 3 groups you can use capital letters, P, Q, R, S, T, U,
V, etc. In effect, you are simply assigning names to the variables.
In many statistics textbooks you would find different letters or symbols used
to denote a set of scores in a variable. Once you have learnt one system that
is convenient to you, the best practice is to stick to one system consistently.
3.2 You would find however that certain letters or Roman alphabets are
consistently used by many authors to denote particular types of variable. For
instance, the letter N is often used to represent the total number of members
NOUN
25
EDU 701
STATISTICAL METHODS I
If, as we had learnt earlier, we want to look at the scores of 7 students only
out of the 50 in the class, we can write n=7 meaning we are concerned with
n(7), a smaller number taken out of the larger number (50) in the class
Statisticians often have to deal with samples and populations. If N represents
the number in the population, then n is the sample size.
If in a class of 50, six students earned the same score, say 57, the frequency
(f) of the score of 57 is 6 (f=6). As you can see, we are already beginning to
learn the language of handling numbers. We are learning to describe large
sets of numbers with which we may be confronted from time to time.
You are no doubt familiar with these operations. You may not be so familiar
with the square root operation ( ): when you are asked to take the square
root of a number, say 25, it should be interpreted to mean what number do
you multiply by itself to get 25? The answer of course is 5, for 5x5 = 25,
therefore the root of 25 = 5.
This process would prove helpful in this course. The reverse operation is
described as take the square of and it is written with the figure 2 as an
2
upper script to the figure you want to square, thus: 7 = 49
2
6 = 36
92 = 81
102= 10
NOUN 26
EDU 701
STATISTICAL METHODS I
Exercise 3.1
(a) What are the square roots of: 121, 100, 400, 265, 39
3.4 Let us now try to combine these concepts. Quite often the statistician wants
to add one set of scores to another, as would be the case when the continuous
assessment (CA) scores are added to the end of year scores in English
language, in order to describe the overall year-round performance of a
student in one subject. If the set of scores in the CA is denoted as X, the set
of scores representing end-of-year examination may be denoted as Y. The
operation to add these sets of scores together can be denoted simply as X +
Y, X + Y means add the set of X scores (CA) to the set of Y scores (end of
year) scores.
Clearly, what needs to be done is brief and concise. Everybody who handles
statistics would immediately understand what operation needs to be done. If
three sets of scores X, Y, Z, are to be added, it is denoted by X+ Y+Z.
Another very common operation is to add up the set of scores in one variable,
if we want to know how much money in our hands in a class, you would have
to add up what each and everyone has with him/her. The procedure would be
to write down the names of everybody in the class and against each name
state the amount of money with him/her. Similarly, you obviously have to
add up the scores in each item of an examination to get a total score for a
candidate in that exam, when the scores in a group of scores (a variable) is to
be added.
Let us say X represents the set of the odd numbers between 1 and 10
inclusive, that is 1,3,5,7,9 and Y represents the set of even numbers (that is
numbers divisible by 2) between 1 and 10 inclusive, thus, Y represents
2,4,6,8,10
For convenience, you may re-write these numbers in columns rather than
rows to make them conform with our customary or more familiar approach to
adding numbers,
NOUN
27
EDU 701
STATISTICAL METHODS I
Thus,
X Y
1 2
3 4
5 6
7 8
9 10
25 30
Similarly,
If Y represents 2,4,6,8 and 10
.: Y =30.
Note also, that between 1 and 10 inclusive, there are five number of numbers
and no number has occurred more than once, similarly, between 1 and 10
inclusive, there are five number of numbers and again no number was
repeated,
We can state that n of x is 5
n of Y is also 5
In this case, the value of n is the same in both instances. They need not be
the same.
Suppose, X represents 0, 1, 3, 5, 7, 9
That is one set of scores starts and includes 0, as would meaningfully be the
case if you are adding up cash-in- hand in your class, where you are likely to
come across, at least, one person who has no money with him or her.
NOUN 28
EDU 701
STATISTICAL METHODS I
Note also that the frequency of occurrence of any particular score in the
group of scores (X) is 1
Exercise 3.2
Frequently, a set of scores (Y) may have to be subtracted from a set of scores
(X).
It is written as X-Y, meaning take the set of scores in X and subtract
corresponding scores in Y.
Two sets of scores can be multiplied with each other. It is written as X.Y
Here the dot (.) is used to represent the more familiar multiplication sign, in
order to avoid confusing it (X) with the X- variable or set of scores. A very
special case of multiplication is when you multiply a score by itself as in 4.4
2
= 16 = 4 . or a set of scores in which each score is multiplied by itself, as in
2 2
X.X = X Y.Y = Y This form of representation of the process is particularly
useful, when you want to multiply X by itself and by itself, namely X, X, X,
= X3 You must learn to distinguish between X3 as in 33 = 3x3x3 =27 and 3x
3
= 3 x which is x multiplied by a factor of 3, as in 3x3 = 9, 3x4 = 12, 4
=
4.4.4 = 64. Similarly, X may be divided by Y and it is written as X/y.
Exercise 3.3
NOUN 29
EDU 701
STATISTICAL METHODS I
What is the mean salary grade level of primary school teachers? (GL 05).
_
The mean of a set of scores(X) is represented by X , pronounced as x-bar.
_
.: X = X
N
Exercise 3.6
_
(i) Take X = 0, 1, 3, 5, 7, 9 what is X?
_
(ii) Take Y = 2, 4, 6, 8, 10. What is Y?
(iii) if X = 1, 3, 5, 7, 9 }
and Y = 2,4,6,8,10}what is (Y-X)/N?
4.0 CONCLUSION
Remember that our task is to learn to handle large amount of numerical data
in concise meaningful ways, which all would understand, statisticians have
invented all kinds of notations/shorthand /symbols that would enable you
describe data and what mathematical operations can be carried out on them.
Next we can now take a deeper look at the nature of data types, the way they
are derived from measurement and the various measurement scales that are
used. The nature of the object to be measured, dictates the type of
measurement that is done and the scale that is used. The type of data
obtained determines the kind of statistical analysis that is appropriate.
5.0 SUMMARY
In this unit, you have learnt how to represent a set of numbers with letter X
and another set with Y. you have learnt that technically multiple sets of
numbers can be represented by convenient Roman alphabets, P, Q, R, S, T,
Or A, B, C, D, etc.
NOUN
30
EDU 701
STATISTICAL METHODS I
i. ..
_
ii. X.
iii. f.
iv. N.
v.
vi. X,Y.
vii.
viii. =
ix. +
x. f
xi. X2
^
xii. X
NOUN 31
EDU 701
STATISTICAL METHODS I
Unit 4
MEASUREMENT SCALES.
1.0 INTRODUCTION
All data are derived from some kind of measurement. The scores in a
football tournament are measures of how many goals are scored by
competing teams. At the end of the tournament, all that you see is a record of
performances
Note that these scores are really counts of goals scored and that there is no
half - goal! Note also that we can talk about total goals scored in the
tournament and goal differences but we cannot multiply or divide goals
scored.
This type of information, score, is obviously different from the type we got in
the football tournament. It must mean that each object, event or phenomenon
must have its own peculiar way of scoring it. This unit is concerned with the
various scales that are used to measure different variables.
2.0 OBJECTIVES
NOUN 32
EDU 701
STATISTICAL METHODS I
In the first situation, the numbers have been assigned purely on arbitrary
basis. Any student could be assigned No. 1 while any one could be assigned
No. 30. No two students can be compared on the basis of allotment of
numbers, in any respect. The students have been labeled from 1 to 30 in
order to give each an identity. This type of scale is described as a nominal
scale. Here the property of identity is applicable but the properties of order
and additivity are not applicable.
In the second situation, the students have been assigned their position
numbers in a queue from 1 to 30. Here the numbering is not arbitrary. The
numbers have been assigned according to the height of the students. So the
students are comparable on the basis of their heights, as there is a sequence in
this regard. Every subsequent child is taller than the previous one, and so on.
This type of scale is described as an ordinal scale. Here the object or event
has got its identity. as well as order. As then difference in height of any two
students is not known so the property of addition of numbers is not applicable
to the ordinal scale.
In the third situation, the students have been awarded marks from 0 to 50 on
the basis of their performance in the test administered on them. Consider the
NOUN 33
EDU 701
STATISTICAL METHODS I
In the fourth situation, the exact physical values pertaining to the heights and
weights of all students have been obtained. Here the values are comparable
in all respect. If two students have heights of 120 cm and 140 cm, then the
difference in their heights is 20 cm and the heights are in the ratio 6:7. This
scale refers to ratio scale.
Exercise 4.1
NOUN 34
EDU 701
STATISTICAL METHODS I
In all these cases, the values can have fractions and the fractions are
meaningful. Thus, continuous data are usually associated with interval and
ratio scales of measurement.
You have now learnt that the type of data you have depends on the type of
movement you have made. The type of measurement you make is itself
dependent on the characteristic or property of an object or event that is being
measured. If your "measurement" is about an object or event that is
described by counting, classification, categorization, "order of" degree of ""
or "extend to" we say you are doing qualitative analysis and the data are
qualitative. A critical analysis of a historical event is a qualitative analysis.
So that, in an analysis where this conversion has been done for the variable of
gender, each time you see gender=1, we know we are talking about males and
when gender=2, we know we are talking about female. These conversions do
not make them any less qualitative, because in this case, you cannot add 1+2!
NOUN 35
EDU 701
STATISTICAL METHODS I
The statistical analysis which are associated with qualitative data are
described as non-parametric.
If, on the other hand, your measurement is about an object or event that is
continuous, such as time, weights, heights, scores in a test, temperature and
intelligence, then the data is continuous and the statistical analysis is
quantitative. Most scientific experiments involve measurements of attributes
which are continuous in nature; scientific data are therefore almost always
numerical and can be treated mathematically. They can be added or
subtracted, multiplied or divided. The statistical analyses which are
associated with quantitative data are described as parametric analyses.
Variables
Qu antitative
Qualit ativ e
Continuous Data
Nomin al d at a
Discrete Data
Data
Ordinal
Interval Ratio
Data Data
Non-parametric Data parametric
Conclusion
In this unit, you have learnt four types of measurement scales that give rise to four
types of statistical data, namely:
NOUN 36
EDU 701
STATISTICAL METHODS I
i. Nominal data recognized by name (from the French which is nom) e.g.
names of staff.
ii. Ordinal data recognized by order/rank but no magnitude such as academic
titles and ranks (Professor, Associate Professor, Senior Lecturers, Lecturers
etc).
iii. Interval data recognized by the meaningfulness of part scores between units
and equality of units but no absolute zero (e.g.) the meter ruler.
iv. Ratio data recognized by the characteristics or attitudes associated with
interval data in addition to a nominal and ordinal data,
NOUN 37
EDU 701
STATISTICAL METHODS I
UNIT 5
1.0 INTRODUCTION
So far you have been exposed to the various types of data, you have learnt the
scales of measurement or the levels of measurements used in generating data,
we shall move forward in the handling of data by arranging and presenting
the data using different methods. This is a step towards the analysis of data.
Although we have always emphasized that statistics is about large amount of
data. We shall be using relatively small amount of data in this course, in
order to enable you learn the methods very well and to enable you do the
calculations conveniently by hand.
2.0 OBJECTIVES
When a statistician is confronted with data, the first thing he does is some
kind of organization. Organization of data involves the arrangement or
grouping of data in order of magnitude or according to the
species/type/kind/form/ or pattern. Naturally data or scores collected from
observations are disorderly and not arranged in any manner. There is
therefore a need to organize the scores in an order.
However, when the number of students involved is large, the simple listing of
names and scores will not make it easier for proper analysis. Sequencing can
be used.
NOUN 38
EDU 701
STATISTICAL METHODS I
7, 5, 3, 9, 5, 8, 4, 7, 4, 2
3, 2, 5, 6, 4, 8, 6, 4, 7, 6
5, 7, 6, 8, 5, 4, 7, 3, 5, 6
These scores are not arranged in any order. They can be arranged in
ascending or descending order, as follows.
2, 2, 3, 3, 3, 4, 4, 4, 4, 4
5, 5, 5, 5, 5, 5, 6, 6, 6, 6
6, 7, 7, 7, 7, 7, 8, 8, 8, 9
Activity 5.1
Given that a class of 40 students was given a test in technical drawing, out of
a maximum of 15 marks, the following scores were obtained.
In the last activity, i.e. activity 5.1, where you arranged the scores of 40
students in order, you noted that this method of organising data is very
clumsy, especially when the number of scores in the distribution is very
large.
NOUN 39
EDU 701
STATISTICAL METHODS I
Example 5.2. Given the same set of scores as in example 5.1 above, we are
required to form a frequency table with it.
7, 5, 3, 9, 5, 8, 4, 7, 4, 2
3, 2, 5, 6, 4, 8, 6, 4, 7, 6
5, 7, 6, 8, 5, 4, 7, 3, 5, 6
Score Tally F.
2 2
3 3
4 5
5 6
6 5
7 5
8 3
9 1
NOUN 40
EDU 701
STATISTICAL METHODS I
Scores Frequency
2 2
3 3
4 5
5 6
6 5
7 5
8 3
9 1
Activity: 5.2
The type of data presented above is called ungrouped data. You will note
that most of the times data collected and recorded especially from the social
sciences and education may be so many in numbers. In such a situation we
need to group the scores so as to organize and present them. This is because
it is cumbersome to study or interpret large data without grouping it, even if it
is arranged sequentially. The data are therefore organized into groups called
classes and presented in a table which gives the frequency of each group or
class. Such a frequency table gives us a better overall view of the distribution
of data and enables us to rapidly understand important characteristics of the
data.
Example 5.3
The scores of a group of students from an examination were recorded as
follows:
NOUN 41
EDU 701
STATISTICAL METHODS I
i. Find the range: the highest score minus the lowest score. In this case,
it is 93-48 = 45
ii. Decide on how many classes or groups you want. This depends on
the size of data, but between 5 and 20 groups are recommended. For
very large data, 10 to 20 groups are recommended but for relatively
small size use between 5 and 10 groups.
iii. The width of the intervals or class interval (C.I) is got by dividing the
range by the number of classes.
e.g. range = 45
number of classes = 10
:. Class interval = 45/10 = 4.5 = 5
The length of class interval preferred is
2,3,5,10 or 20
iv. Group the scores and tally as usual.
1 93 97 2
2 88 92 4
3 83 87 8
4 78 82 15
5 73 77 12
6 68 72 20
7 63 67 8
8 58 62 6
9 53 57 4
10 48 - 52 1
N = 80
NOUN 42
EDU 701
STATISTICAL METHODS I
Activity 5.3
In both example 5.3 and activity 5.3, you have noted the frequency tables
have only two columns when you remove the tally column. But from
frequency distribution we may have other columns added for relative
frequency, cumulative percentage distribution. When these are added the
table becomes a composite table.
Example 5.4
S/No Class Int. F R. F. C. F. C%
2 80 100
1 93 97 2 /80 = 0.025 2+4+8+15+15+12+20+8+6+4+1=80 /80 x /1 = 100%
4
2 88 92 4 /80 = 0.050 4+8+15+12+20+8+6+4+1 = 78 /80 x 100/1 = 97.5%
78
8 74 100
3 83 87 8 /80 = 0.100 8+15+12+20+8+6++4+1 = 74 /80 x /1 = 92.5%
4 78 82 15 15/80 = 0.180 15+12+20+8+6+4+1 =66 66/80 x 100/1 = 82.5%
5 73 77 12
12 /80 = 0.150 12+20+8+6+4+1 =51 51/80 x 100/1=63.75%
6 68 72 20
20 /80 = 0.250 20+8+6+4+1 =39 39/80 x 100/1=48.75%
7 63 67 8
8 /80 = 0.100 8+6+4+1 =19 19/80 x
100
/1=23.75%
NOUN
43
EDU 701
STATISTICAL METHODS I
Activity 5.4
So far you have learnt how to present data using tables. Now, you are going
to see another method of presenting data. This method is called Pictograms
or Ideographs. It involves the representation of groups of numerical data by
the use of some pictures or diagrams. This is done such that a single picture
or diagram represents a specified number of scores, items or objects.
NOUN 44
EDU 701
STATISTICAL METHODS I
Example 5.5
Given that the population of some towns in Abia State are as follows (this is
hypothetical):
This table can be represented with ideographs this way. First, we choose
a
convenient and representative picture which will represent a specified
number. For instance, you can say let represent 1,000,000 people.
Therefore, Aba has
Aba =
Umuahia =
Ohafia =
Bendel = Ukwa =
Obingwa =
Example 5.6
Okigwe
s s s s s s s s
Owerri s s s s s s s s s s
Orlu s s s s s s s s s s s
Obowo s s s s s s s
s s s s s
NOUN 45
EDU 701
STATISTICAL METHODS I
Oru
Onuimo s s s s s
Orsu s s s
Activity 5.5
i. Enugu 850,000.
ii. Kaduna 700,000
iii. Kano 650,000
iv. Onitsha 750,000
v. Ibadan 920,000
vi. Benin 680,00
vii. Port Harcourt 930,000
viii. Umuahia 800,000
ix. Minna 540,000
4.0 CONCLUSION
In this Unit you have been exposed to the various ways of organizing and
presenting your data. Since in education you will be involved in generating a
lot of numerical data from examinations, tests and research results, it is
expected that you will not be in a tight corner as far as handling the data is
concerned. You can decide to take any method to present your data for
analysis.
NOUN 46
EDU 701
STATISTICAL METHODS I
5.0 SUMMARY
In this Unit you have seen how to arrange data in order either ascending
or
descending, this is referred to as sequencing and it is called organization of
data. You have also learnt how to present data in a frequency table, using
tallies in both ungrouped and grouped data. Frequency implies the number of
occurrences in a particular group or set. In this unit too, you have learnt how
to group data using class interval which is got by dividing the range by the
number of groups. Remember that the number of groups is dependent upon
the size of the data. For not too large size, 5 to 10 is recommended, but for
large number of data, the recommended number of groups is from 10 to 20.
You have seen how to construct a composite frequency distribution table to
include the class intervals, frequencies, relative frequencies, cumulative
frequencies and cumulative percentage distribution.
Apart from sequencing and tabular presentations you have learnt that pictures
or diagrams can be used. Other methods of presenting data called graphical
methods will be seen in the next unit.
1. Group the scores with a class interval of 5 and use the groups to form a
composite table.
2. Arrange the scores above in ascending order of magnitude.
3. What is an ideograph?
Given that the numbers of books in a local Library are as follows:-
NOUN 47
EDU 701
STATISTICAL METHODS I
Section A 5,000
Section B 7,000
Section C 15,000
Section D 3,000
Section E 1,500
Section F 6,300
Section G 2,800
Section H 3,500
7.0 REFERENCE
NOUN 48
EDU 701
STATISTICAL METHODS I
7UNIT 6
1.0 INTRODUCTION
Data which are shown in tabular form as you have seen in the last unit, can
also be displayed using graphs. You will note that a well constructed
graphical presentation is the easiest way to show a given set of data. In this
unit the graphical methods of representing data in statistics, such as bar chart,
pie chart, histogram, frequency polygon. Cumulative frequency curve and
cumulative percentage curves will be explained and illustrated.
2.0 OBJECTIVES.
You are familiar with graphs and their constructions. In your mathematical
lessons or science lessons in your secondary school days, you were used to
plot graphs. The bar chart or bar diagram or bar graph involves using
orthogonal reflections or shadows of bars or rectangular bars of equal breadth
and different heights or lengths to represent the table. Like every other graph
which you are used to, the bar chart has two axes, which are the frequency
axis or the vertical axis and the item axis or the horizontal axis. The height of
each bar is proportional to the frequency.
Example 6.1
Given that Mr. Adewales farm has the following animals:
NOUN 49
EDU 701
STATISTICAL METHODS I
You remember that the frequencies are plotted on the vertical axis. Therefore
note that the highest number on the frequency is 150. Since it is difficult to
divide the frequency axis into 150 units, we choose a suitable scale. Let us
take 1 cm to represent 10 units along the frequency axis. So we set up the
axes as shown.
160
F 140
120
100
80
60
40
20
0
Cow Goat Pig Rabbit Sheep Dog
Animals
In constructing a bar chart there are some very important notes to take. These
are:-
Activity 6.1
Represent the following data in a bar chart.
A
Crops Yam Cassava Maize Potatoes Fruits Vegetables
Frequency 500 kg 2000 kg 900 kg 1500 kg 800 kg 300 kg
B
Colours Red White Yellow Black Blue Brown Pink
Frequency 360 120 500 80 450 200 300
NOUN 50
EDU 701
STATISTICAL METHODS I
The word pie can be traced to the British circular pie dish or the
Mathematical pie. While the pie chart involves using a circular construction
to represent the statistical data, such that the data are placed in sectors got
using proportions that represent the frequencies.
Example 6.2
NOUN 51
EDU 701
STATISTICAL METHODS I
Pigs
Goats
Dogs
Rabbits
Cows
Turkey
Sheep
Activity 6.2
The table below shows the frequency distribution of students offering some
subjects in SSCE in 2001/2002 session in a secondary school in a state in
Nigeria.
NOUN 52
EDU 701
STATISTICAL METHODS I
3.3 HISTOGRAM
You will recall from example 6.1 that bar charts are constructed with
rectangular bars of equal width and different heights which correspond to the
frequencies. The histogram is also represented with rectangular bars. But the
bars are not separated as is the case with the bar chart. Can you guess why?
This is because the data here are continuous data; therefore, the continuity is
shown at the base of the rectangles. The histogram is therefore a bar graph
frequency distribution in which the bars are not separated.
You have noted the introduction of class boundaries here. Class boundaries
mark the limits of a class interval or group. In other words a class interval or
group has two class boundaries:- the upper class boundary and the lower
class boundary. Let us take the group 62-67 for instance, the upper class
boundary is 67 and the lower class boundary is 62. But in the histogram we
make use of the Real limits. So we have the real lower limit and the
real
upper limit. In this case the real lower limit is 61.5 and the real upper limit is
67.5. For you to understand it more, take the number I, for instance.
The
Real lower limit or boundary is 0.5 and the Real upper boundary or limit
is
1.5. So I can be described with its real limits as from 0.5 1.5.
i. Draw the vertical and horizontal axes (it is easier to use graph papers
where the lines are already drawn).
ii. Mark the real class boundaries along the horizontal axis or the score
axis.
iii. Mark the frequencies along the vertical axis.
iv. Construct the bars for each class boundary with a height
corresponding to the frequencies.
Example 6.3
NOUN 53
EDU 701
STATISTICAL METHODS I
Steps to follow:
frequencies against the real class boundaries using suitable scales, lets use 2
cm to represent 3 units on the frequency axis.
Scores
Points to note:-
NOUN
limits of the intervals.
i. The horizontal ii. Start on the left with the lowest values and then proceed to the right
axis otherwise with as many internals as are necessary to include all the scores. Do
called abscissa not extend this axis to zero, unless scores of zero or near zero have
been observed. By convention, leave an empty interval at both ends
represents the score
to show zero frequency in those intervals.
possibilities, iii. The vertical axis otherwise called the ordinate represents the
which may be frequencies. This axis is marked off with zero at the bottom and
single scores or moves upwards to the greatest frequency.
class intervals. iv. The selection of the distance or scale along either axis is arbitrary, but
In the it is a convention among statisticians to follow the high rule. This
case of grouped states that the vertical axis should be laid out such that the height of
data, the the maximum point or highest frequency is approximately of the
abscissa is
usually marked
off by the mid
points or the real 54
EDU 701
STATISTICAL METHODS I
length of the horizontal axis. (Take note of this rule when ever
you
want to plot a graph in statistics).
v. A bar or rectangle is drawn above each score interval on the
horizontal axis. The width extends from the lower real limit to the
upper real limit of the intervals. Bars are adjacent to and touch each
other to show the continuity of the scores in a continuous data.
Where there is zero frequency, leave an empty space or interval.
vi. The vertical axis should be labelled for frequency while the horizontal
axis is also labelled to show what is being measured (e.g. scores,
height in, weight in gms, time in seconds, temperature in degree c/f
etc). Always put the descriptive title indicating what the graph is
showing.
Activity 6.3
Given below are the frequency distributions of grouped scores for a set of
students in a Geography test. Prepare a composite table and use it to
construct a Histogram.
You have learnt how to construct graphs but the graphs you have so far
constructed in this unit were bar graphs. The frequency polygon considers
the frequencies against the scores as in the Histogram, but this time the
polygon is a line graph which uses the frequencies against the class marks or
the mid points of the class intervals. It is a graph joining the points of
interception between the two points marked or shown by Xs or dots. These
points are joined with straight lines which are made to rest on the horizontal
axis. It can also be plotted on a histogram, but this takes time.
NOUN 55
EDU 701
STATISTICAL METHODS I
Example 6.4
Consider the group scores obtained from a computer science test by a group
of students in a school.
Steps to follow:
NOUN 56
EDU 701
STATISTICAL METHODS I
26
24
22
20
F
18
16
14
12
10
0
7 12 17 22 27 32 37 42 47 52 57 62 67
Scores
Activity 6.4
NOUN 57
EDU 701
STATISTICAL METHODS I
S/No Marks F
1 85 89 1
2 80 84 2
3 75 79 6
4 70 74 10
5 65 69 15
6 60 64 25
7 55 59 38
8 50 54 20
9 45 49 13
10 40 44 5
You have learnt that cumulative frequency shows the number of scores which
are below the upper limit of each class interval. The cumulative frequency
curve which is otherwise called Ogive is a graph of the cumulative
frequencies against the scores or the real exact limits of the class interval.
The points of contact represent the cumulative frequencies at the exact upper
limits of the intervals.
You will have to take note that the general trend of the Ogive is progressively
rising, there are no inversions or set backs. The upward rise is not a straight
line. It usually takes the shape of a shallow S. While the upper branch
approaches its limit N gradually, the lower branch approaches its limit of
Zero but not as gradually as the upper branch.
Example 6.5
Represent the grouped data below in a cumulative frequency curve or Ogive.
NOUN 58
EDU 701
STATISTICAL METHODS I
N = 50
STEPS TO FOLLOW
i. Complete the composite table to include, the real upper limit (RUL)
and the cumulative frequencies (C. F.)
ii. With suitable scales draw the vertical and horizontal axes, with the
vertical axis having the cumulative frequencies, while the horizontal
axis has the real upper limits
iii. Match the cumulative frequencies against their corresponding real
upper limits and join with a smooth curve.
S/No C. I. F.
1 95 99 2
2 90 94 4
3 85 89 6
4 80 84 10
5 75 79 14
6 70 74 14
7 65 69 50
8 60 64 40
9 55 59 33
10 50 54 8
11 45 49 6
12 40 44 1
NOUN 59
EDU 701
STATISTICAL METHODS I
For the scales: let us say 1 cm represent 4 units on the vertical axis and 1.5
cm represent 5 units on the horizontal axis.
Activity 6.5
Use the table given in the activity 6.4 to construct a cumulative frequency
curve.
You have finished activity 6.5 look at the curve you have obtained. Recall
that you obtained the curve using cumulative frequencies against the real
upper limits. Now, if you convert the cumulative frequencies to cumulative
percentages and then use the values against the exact upper limits, you will
get the same curve. So in some situations, we may wish to use the
cumulative percentages instead of cumulative frequencies in the construction
of the curve. In this case the cumulative frequencies are converted to get the
shallow S-curve. The advantage here is that one can very quickly
approximate the percentage of the total number of cases which fall below
certain scores.
Activity 6. 6
4.0 CONCLUSION
In this unit, you have been exposed to the various types of graphical
representation of statistical data, and their applications in representing the
data. As teachers who generate data often, you can choose any of these
methods at any time, depending on what you want, to represent your data for
easy interpretations.
NOUN 60
EDU 701
STATISTICAL METHODS I
You can go through these methods again for your practice and familiarise
yourself with them. Once again they are Bar chart, pie chart, histogram,
frequency polygon, cumulative frequency curve and cumulative percentage
curve.
5.0 SUMMARY
In this unit, you have been able to go through with illustrations and
constructions, the graphical methods of representing statistical data. These
are:
i. The bar graph, which involves constructing rectangular bars using the
frequencies against the items to represent the data.
ii. The pie chart which involves using proportions that represent the
frequencies.
iii. The histogram which is a bar graph frequency distribution in which
the bars are not separated because the data are continuous. Its
construction involves plotting the frequencies against the scores or
class boundaries of corresponding class intervals.
iv. Frequency polygon which is a graph that considers the frequencies
against the class marks or mid points of the class intervals.
v. Cumulative frequency curve which is otherwise called Ogive is also a
graph, but that of the cumulative frequencies against the scores or the
exact limits of the class intervals.
vi. The cumulative percentage curve, which can be used sometimes in the
place of cumulative frequency curves, involves converting the
cumulative frequencies into cumulative percentages and using them to
plot against the exact upper limits of the class intervals. The points
are joined to get a shallow S-curve.
NOUN 61
EDU 701
STATISTICAL METHODS I
9 81 90 4
10 91 100 1
200
7.0 REFERENCES
NOUN 62
EDU 701
STATISTICAL METHODS I
Unit 7
1.0 INTRODUCTION
2.0 OBJECTIVES
i. Define the mean and calculate the mean from a given set of scores.
ii. Explain the median and find the median from a given set of scores.
iii. Describe the mode and locate the mode in a given set of scores.
You are familiar with the arithmetic average which is used to find the average
performance of the students in your class, or the average performance of
students in deferent school subjects. This is the same with the mean which is
an interval statistics and which is generally most reliable, most stable and
most widely used measure of central tendency and which takes into account
every score in the distribution. It can be used in computation for more
sophisticated statistical analyses. It is equal to the sum of the scores divided
by the number of scores. The symbol is x and the formula is x =X/N
where x = mean
F = Sum of
X = raw score
N = number
Example 7.1
:- X = 130/10 = 13
NOUN 63
EDU 701
STATISTICAL METHODS I
The example above is used when small number of data is given. Most of the times,
data can come in a frequency table as in the example below.
Example 7.2
S/No X F
1 10 3
2 9 2
3 8 4
4 7 4
5 6 3
6 5 4
7 4 4
8 3 2
9 2 3
S/No X F FX
1 10 3 30
2 9 2 18
3 8 4 32
4 7 4 28
5 6 3 18
6 5 4 20
7 4 4 16
8 3 2 6
9 2 3 6
= 29 174
Steps to follow:-
i. The formula is x = fx/f Therefore add the next column on the table
which is fx
ii. Find f = 29
iii. Find fx = 174
iv. :- x fx/f = /29 = 6
174
NOUN 64
EDU 701
STATISTICAL METHODS I
Example 7.2
Steps to follow:
The mean can also be calculated using another method called assumed mean or
deviation method. Let us take another example to illustrate this.
NOUN 65
EDU 701
STATISTICAL METHODS I
Example 7.4
Assumed mean method or deviation method
S/No C. I. F
1 55 59 2
2 50 54 2
3 45 49 6
4 40 44 8
5 35 39 12
6 30 34 14
7 25 29 24
8 20 24 12
9 15 19 16
10 10 14 4
S/No C. I. CM F X FX
1 55 59 57 2 5 10
2 50 54 52 2 4 8
3 45 49 47 6 3 18
4 40 44 42 8 2 16
5 35 39 37 12 1 12
6 30 34 32 14 0 0
7 25 29 27 24 -1 -24
8 20 24 22 12 -2 -24
9 15 19 17 16 -3 -48
10 10 14 12 4 -4 -16
100 -48
Steps to follow:-
ii. Note the group which is centrally located, or which has about half of the
scores. The mid point is taken as the assumed mean. In this case, take 32.
iii. The deviation is coded O on the assumed mean. Then above it we have
1,2,3N, and below we have -1,-2,-3.N.
NOUN 66
EDU 701
STATISTICAL METHODS I
vi. Use the formula for assumed mean = x = AM + int (fx/f) where x = mean,
AM = assumed mean, int. or I = class interval size.
S/No C. I. F.
1 95 99 2
2 90 94 4
3 85 89 6
4 80 84 10
5 75 79 14
6 70 74 14
7 65 69 50
8 60 64 40
9 55 59 33
10 50 54 8
11 45 49 6
12 40 44 1
You have learnt that the mean is an arithmetic average. The median is also
an average. But it is a positional average. It is the middle point in a scale of
distribution of measurement above which half or 50% of the distribution falls
and below which half or 50% of the scores lie. The first step in locating the
middle point or median is to arrange the scores in order-ascending or
descending. You will recall that organisation of data or sequencing involves
arranging the scores in order.
NOUN 67
EDU 701
STATISTICAL METHODS I
Example 7.5
Find the median of the following set of scores: 10, 21, 15, 30, 22, 12, 11, 6, 5,
3, 4.
Steps to follow:
This is possible when the number N is odd. If the number, N, is even, you
will add the two middle numbers and divide by two.
Example 7.6
Steps to follow:
i. Arrange in order. = 11, 12, 13, 14, 15, 16, 16, 17, 19, 20.
ii. Find the middle score or scores. i.e. 15 and 16
31
iii. The median therefore is 15+16/2 = /2 = 15.5
Activity 7.2
Find the median of the following set of scores.
65, 48, 39, 57, 70, 49, 33, 72, 61, 42, 38, 66, 75, 57, 45, 59, 60, 47, 55, 68
You have seen that the examples above used ungrouped data. Median can
also be found in grouped data. Let us take another example with grouped
data.
Example 7.7
S/No C. I. F.
1 60 64 2
2 55 59 2
3 50 54 6
4 45 49 8
5 40 44 12
6 35 39 14
7 30 34 24
8 25 29 12
NOUN 68
EDU 701
STATISTICAL METHODS I
9 20 24 16
10 15 19 4
S/No C. I. F C. F
1 60 64 2 100
2 55 59 2 98
3 50 54 6 96
4 45 49 8 90
5 40 44 12 82
6 35 39 14 70
7 30 34 24 56
8 25 29 12 32
9 20 24 16 20
10 15 19 4 4
N = 100
Steps to follow:
NOUN 69
EDU 701
STATISTICAL METHODS I
~
:- X = L + (N/2 Cfb)i = 29.5 + (100/2 32)5
fw 24
Activity 7.3
^
3.3 The Mode (X)
You have learnt that the median is a positional average. The mode is a
measure of popularity. It is defined as the point on the scale of measurement
with maximum frequency in a distribution. In other worlds, it is the score
value which occurs most frequently in a group of scores.
Example 7.8
Here the most popular score is 20. It has occurred three times, and more than
any other score.
You can see that in the example 7.8 above, we have only one mode. It is
therefore called unimodal. But sometimes you may come across a set of
scores which has two modes.
NOUN 70
EDU 701
STATISTICAL METHODS I
Example 7.9
1, 2, 9, 6, 7, 5, 2, 8, 9, 4
In this example, 2 and 9 appeared two times while others appeared once.
Therefore, the modes are 2 and 9. This is bimodal; you may also come
across, some cases where there are more than two modes. This is called
multimodal.
You have noted that the two examples above made use of ungrouped data.
When grouped data are given, the mode is the mid point of the class with the
highest frequency i.e. the modal class.
Example 7.10
i = class size
NOUN 71
EDU 701
STATISTICAL METHODS I
S/No C. I. F
1 95 99 2
2 90 94 4
3 85 89 6
4 80 84 10
5 75 79 14
6 70 74 14
7 65 69 50
8 60 64 40
9 55 59 33
10 50 54 8
11 45 49 9
12 40 44 1
4.0 CONCLUSION
In this unit you have been exposed to the measures of central tendency which
are bench marks or typical scores which give precise and brief description of
a set of data. These are very important aspects of statistics which you as a
teacher can not afford to toy with.
To make your data very precise for interpretation, you will need to learn
these measures of location very well.
5.0 SUMMARY
In this unit you have learnt that the measures of central tendency are a set of
bench marks which make precise and brief presentation or description of a set
of scores. The three basic measures of central tendency are the mean, the
median and the mode.
NOUN 72
EDU 701
STATISTICAL METHODS I
The mean is the most widely used. It is equal to the sum of the scores
divided by the number of scores. The symbol is x and the formula is X/N or
fx. Or for assumed mean = AM + int (fx/f).
f
7.0 References:-
NOUN 73
EDU 701
STATISTICAL METHODS I
UNIT 8
MEASURES OF VARIABILITY/DISPERSION I
1.0 INTRODUCTION
2.0 OBJECTIVES
This is the simplest but crude and unreliable method of estimating variability.
It is defined as the difference between the highest and the lowest scores in a
given distribution. It is usually affected by the presence of two extreme
scores. The greater the range, the greater the dispersion or variability. There
are two types of range. The common type, most commonly used and simply
called the range is technically known as exclusive range. It is the highest
score minus the lowest score in a set of scores. It can be found using
the
formula Xh - XL = R where Xh represents the highest score and XL is the
lowest Score.
Example 8.1
66, 59, 72, 62, 57, 54, 66, 79, 14, 65, 64, 95, 59
If you look at the scores very well, you will notice that the lowest score XL
= 14 and the highest score Xh = 95. Therefore the range R. will be Xh XL
= 95 14 = 81
NOUN 74
EDU 701
STATISTICAL METHODS I
Activity 8.1
Note that you do not need to arrange the scores in any order of
magnitude
before finding the range.
The other type of range which is not commonly used is called inclusive
range. It involves subtracting the real lower limit of the smallest score or
observation from the real upper limit of the highest score. It is called
inclusive because both the lowest score and the highest score are included in
this arrangement. It is mostly used with grouped data.
Find the inclusive range in the grouped data below.
In the last unit, you learnt that the Median is a positional score, which
occupies the middle point on the score scale. In this same way, the quartiles
are positional scores. When we count up from below to include the lowest,
or first, quarter of the cases, we find the point called the first quartile. This is
given the symbol Q1. The first quartile is the score point that sets off the
lower quarter or 25% of the group. In the same way, when you count down
from above to include the highest, or fourth, quarter of the cases, we locate
the third quartile or Q3. In other words the 3rd quartile is the score point that
sets off the upper 25% or quarter of the scores. The middle quartile Q2 is the
median score point. You will note that the quartiles Q1, Q2 and Q3 are points
on the measuring scale. They are division points between the quarters. We
may say therefore of an individual that he is in the highest quarter or 4th
NOUN 75
EDU 701
STATISTICAL METHODS I
quarter but not in a certain quartile. So the quartiles are points that divide a
score scale into four equal parts. These points can be located in a distribution
scale.
Example 8.3
Steps to follow:
You have learnt that quartiles are points on the score scale that divide the
total number of observations or scores in a distribution into four equal
groups. You can now locate the points Q1, Q2 and Q3. Then, note that the
interval from Q1 to Q3 contains the middle 50% or half of the scores in a
distribution.
It is called the interquartile range. Note also that if the interquartile range is
divided by two, we shall have what is called the semi-interquartile range or
quartile deviation. This can be found using the formular Q3 - Q1
2
NOUN 76
EDU 701
STATISTICAL METHODS I
Example 8.4
Using the results from example 8.3 we can find the quartile deviation or
semi-interquartile range.
Example 8.5
S/No C. O. F
1 55 59 1
2 50 54 1
3 45 49 3
4 40 44 4
5 35 39 6
6 30 34 7
7 25 29 12
8 20 24 6
9 15 19 8
10 10 14 2
N= 50
Steps to follow:
i. Find N/4 = 50/4 =12.5
ii. Counting up the frequency column, locate the 12.5 cases. You see
that if we count 2+8, we have 10 cases. It means that we need 2.5 out
of the next frequency, which is 6, to complete. We say therefore 2.5/6
x 5 (5 is the interval size) = 2.08
iii. Add the result above to the real lower limit of the class interval i.e.
19.5 + 2.08 = 21.58. This is Q1.
iv. For Q3, count from the top-down 12.5 cases. Again we have
1+1+3+4 will give us 9. 1t means that we will need 3.5 to make it up
to 12.5 cases. It will be 3.5 out of the next frequency which is 6. We
have therefore 3.5/6 x 5 = 2.92
v. Since we are going down, we deduct 2.92 from the real upper limit of
the class i. e. 39.5 -2.22 = 36.58. This is Q3.
vi. Find the semi-interquartile Q, using Q3 - Q1
2
= 36.58 21.58 = 15.00 = 7.5
2 2
NOUN 77
EDU 701
STATISTICAL METHODS I
Note that the formula used in locating the median can be applied here
N
i.e. L ( /2-cfb) i
( fw )
In the case of quartiles, instead of N/2 you will use N/4 so we have Q =
L + (N/4-cfb)i
( fw )
Activity 8.2
Scores 52-54 49-51 46-48 43-45 40-42 37-39 34-36 31-33 28-30
Frequencies 6 11 16 8 9 8 2 3 2
So far you have learnt how to divide a given set of scores into two equal parts
to locate the mid point or the median; you have also learnt how to divide a set
of scores into four equal parts to locate the quartiles. This time we shall
move to another step. This is to divide into ten equal parts to locate the
deciles. Decile points are used to mark off a distribution, thus indicating
points of dividing a distribution of scores into tenths. Thus there are 9 deciles
i.e. from 1 to 9 which divide a distribution into ten equal parts. D1 is the first
decile and below D1 lies the bottom 10% of the group. In the same way D2 is
the point in the distribution below which 20% of the cases fall. Like
quartiles, deciles are points in a distribution not segments.
3.5 PERCENTILES
Percentiles are ordinal measures. They are score points which divide the
distribution into 100 equal parts called percentages. In other words, they are
points on the raw score scale below which given percentages of the cases in
th
the distribution fall. For instance, the 80 percentile is the point on the score
scale that has exactly 80% of the cases below it. Percentiles are symbolised
by the letter Px, with X denoting the particular percentile. Thus, the 90th
percentile is written Pqo, they are used for decision making when part of a
population is to be selected because of its position within the total.
Note that the median corresponds to the 50th percentile, P50 and 2nd Quartile
Q2
NOUN 78
EDU 701
STATISTICAL METHODS I
You will recall that the formula for calculating median is X=L + (N/2-cfb) i.
fw
You also recall the formula for the quartiles = (N/4 cfb)i
fw
you can see that they are almost the same. The formular for calculating the
percentiles is the same and it is the adaptation of this same formula. It is used
for specific percentile points. The general formula which can be used for any
value of Px is Px L + (Pn - Cfb)i
fw where
Example 8.6
NOUN 79
EDU 701
STATISTICAL METHODS I
N 200
Note: /100 = /100 = 2
i=5
S/No Class Interval F
1 95 99 1
2 90 94 6
3 85 89 8 a
4 80 84 33
5 75 79 40
6 70 74 50 b
7 65 69 24 c
d
8 60 64 14
9 55 59 10
10 50 54 8
11 45 49 4
12 40 44 2
N= 200
Steps to follow
i. The steps are the same with those of median and quartiles
38
= 69.5 + x5 = + = 73.3
50 69.5 3.8
iii. For P40 = L +
200 40
x 1 cfb = L + 80 cfb
6 +
100
i i = 9. 62
80
5
5
fu
fw 50
fw
18
= 69.5 + U x5 = + =
N N50 69.5 1.8
O
71.3
80
EDU 701
STATISTICAL METHODS I
200 20
i = L + 40 cfb +
40 38
x cfb fw i=
64. 24
iv. For P20 = 1
5 5
1
00
2
= 64.5 + x5 = + = 64.917
24 64.5
0.417 = 64.917
Activity 8.8
C. I. 91-99 82-90 73-81 64-72 55-63 46-54 37-45 28-36 19-27 10-18 1-9
Frequency 2 3 5 9 15 18 15 9 5 3 2
4.0 CONCLUSION
5.0 SUMMARY
In this unit you have been exposed to some of the measures of variability
which are measures that show the spread of the scores in a given
distribution. The measures you have seen so far are:
i. The range:- which simply shows the difference between the highest
and the lowest observations or numbers.
ii. The quartiles are points which divide the distributions or scores into
N
iii. four equal parts called quarters. The formula is L+ /4-cfb i
fw
iv. The deciles are also points on the distribution that divide the
distribution into ten equal parts or tenths. The formula is
NOUN 81
EDU 701
STATISTICAL METHODS I
L+ (N/10-cfb)i
fw
v. The distance from Q1 to Q3 is called interquartile range while half of
this is called semi-interquartile range or quartile deviation.
v. Percentiles: which are points on the score scale that divide the
distribution into 100 equal parts called centiles or percentages. The
formula is
The 1st quartile Q1 corresponds to the 25th percentile, the 3rd quartile
th nd
Q, corresponds to the 75 percentile while the 2 quartile Q2 which is
the median corresponds to the 50th percentile and the 5th decile.
NOUN
82
EDU 701
STATISTICAL METHODS I
7.0 REFERENCES
NOUN 83
EDU 701
STATISTICAL METHODS I
Unit 9
MEASURES OF VARIABILITY/DISPERSION II
1.0 INTRODUCTION
In unit 7, you learnt the various measures of central tendency and you were
told that these measures are a set of bench marks which make precise and
brief presentation or description of a set of data. You also learnt that the
measures of location are very useful in providing a concise index of the
average of a set of scores. But there is more to know about sets of
scores.
Variability is a universal characteristic of any set of scores with which the
teacher, the psychologist or researcher might have to deal. Some
distributions may have the same mean yet differ in the extent of variation of
the scores around the measure of central tendency.
2.0 OBJECTIVE
Deviation of a score involves trying to find out how far that score is
away
from the mean. Look at the following set of scores for instance; 10, 15, 13,
58
12, 8. The mean is /5 = 11.6. The deviation of 10 from the mean 11.6 = 10-
11.6 = -1.6.1n the same way, the deviation of 15 from the mean = 15 11.6 =
3.4 the deviation of 13 from the mean = 13-11.6 = 1.4 etc.
NOUN 84
EDU 701
STATISTICAL METHODS I
the set of scores. It can also be said to be the average of the modulus
or
positive values of the differences between the individual scores and the mean
of the set of scores. In mathematical notation, mean deviation
_ _
= (|X-X|) or f(|X-X|) where |X-X| is the
N N
positive values or modulus.
Example 9.1
Find the mean deviation of the listed scores below 41, 27, 19, 9, 23, 31, 25,
28, 15, 22, 35.
Steps to follow:
_
S/No X X-X D
1 41 41 25 16
2 27 27 25 2
3 19 19 25 -6
4 9 9 25 -16
5 23 23 25 -2
6 31 31 25 6
7 25 25 25 0
8 28 28 25 3
9 15 15 25 -10
10 22 22 25 -3
11 35 35 25 10
NOUN 85
EDU 701
STATISTICAL METHODS I
Example 9.2
S/No X F
1 20 2
2 19 3
3 18 5
4 17 10
5 16 15
6 15 9
7 14 5
8 13 2
_
S/No X F FX |X - X| F/X-X/
1 20 2 40 3.76 7.52
2 19 3 57 2.27 8.28
3 18 5 90 1.76 8.80
4 17 10 170 0.76 7.60
5 16 15 240 0.24 3.60
6 15 9 135 1.24 11.16
7 14 5 70 2.24 11.20
8 13 2 26 3.24 6.48
51 828 64.64
Steps to follow:
_
i. Find the mean X = 828 = 16.24
51
ii. Complete the composite table to include the scores, X, frequencies F,
FX, and F
iii. Find the total sum of F = 64.64
f
= 64.64=1.27
51
Note that when you have grouped scores, you will use the mid points of the
class interval as X.
NOUN 86
EDU 701
STATISTICAL METHODS I
Activity 9.1
3.2 VARIANCE
You have seen that the deviation scores, which you have studied in 3.1
above, provide a good basis for measuring the spread of scores in a
distribution. But, we can not use the sum of these deviations in order to get
an index of spread because this sum in any distribution will be equal to zero.
This becomes a problem which we must overcome. To do this, square all the
deviation scores. This is to remove all negative scores and make all the
scores positive. This is because all squared scores will be positive. Then
these squared deviation scores are added to give a measure called the sum of
the squared deviation scores which is simply called sum of squares,
( 2
)
The variance therefore is a measure of variability which is derived from the
deviation of scores from the mean. It is defined as the mean of the squared
deviation scores. It is widely used for inferential statistics than for
descriptive statistics. The population variance is symbolized by the lower
case Greek letter sigma (6) raise to the second power i.e. 62, while the sample
variance is represented by S2. For the purpose of this course we shall be using
S2 since the difference is not noticeably high.
For listed scores; using 15, 14, 11, 10, 9, 7, 4. Find the variance.
NOUN 87
EDU 701
STATISTICAL METHODS I
Steps to follow-:
Find the variance of the data below which are score of 15 students in a 10-
itemed multiple choice test on maths 7, 2, 6, 8, 4, 3, 5, 9, 6, 1, 6, 8, 0, 7, 3,
Steps to follow:
1. Complete the composite table to include the scores X, F, FX, X-X, (X-X)2,
F(X-X) 2
X F FX X-X (X-X)2 F(X-X)2
19 2 38 4.13 17.057 34.114
18 3 54 3.13 9.797 29.391
17 5 85 2.13 4.537 22.685
16 10 160 1.13 1.277 12.769
15 15 225 0.13 0.017 0.254
14 9 126 -0.87 0.757 6.812
13 5 65 -1.87 3.497 17.485
12 3 36 -2.87 8.237 24.711
10 2 20 -4.87 23.717 47.434
9 1 9 -5.87 34.457 34.457
55 818 230.112
NOUN 88
EDU 701
STATISTICAL METHODS I
( )
vi. Find the variance S 2
using f X 2
X = 230.112
f 55
= 4.104
Activity 9.3
Find the variance of the data below.
X 25 30 28 22 15 34 14 26 16 20 32 12 8 10
F 7 5 5 9 10 2 8 15 9 12 3 4 1 2
Note that:
i. You shall use N when dealing with population and N-1 when dealing with
the samples.
ii. You shall use the mid points or class mark when you are given grouped data.
Example 9.5:
Steps to follow:
NOUN 89
EDU 701
STATISTICAL METHODS I
( ) 2
ii.
Using the formular n fx fx
2
n2
= 809000-697225 = 111775
1600 1600
= 69.859
Activity 9.4
Find the variance of the grouped data given below.
Class int 56-60 51-55 46-50 41-45 36-40 31-35 26-30 21-25 16-20
Frequency 3 4 5 6 10 18 15 11 8
You have gone through the variance and the methods and processes involved
in calculating the variance. If you have learnt it very well, then you will not
have any difficulty in mastering the methods and processes involved in
calculating the standard deviation. This is because the standard deviation is
simply the square root of the variance. It is by far the most commonly used
indicator of degree of dispersion and is the most dependable estimate of the
variability in the population from which the sample is drawn. It also enters
into numerous other statistical formulas which we shall see latter in this
course. The standard is a kind of averages of all the deviation form the mean,
but it is not a simple arithmetic mean. The symbol is S for sample and for
population.
To compute the standard deviation, you will have to find the variance, then
find the square root of the variance.
NOUN 90
EDU 701
STATISTICAL METHODS I
Example 9.6
Find the standard deviation of the scores below.
S/N X Dev
1 15 5 25
2 14 4 16
3 11 1 1
4 10 0 0
5 9 -1 1
6 7 -3 9
7 4 -6 36
70 0 88
Steps to follow.
4S= (X X) =
2
88 12.57
N 7 = 3.55
Steps to follow:
Find the squares of the raw scores X2
X 10 8 7 6 3 2 = 37
= 263
2
X 100 64 49 36 9 4
ii. Find the sum of the scores and squared scores. i.e. x = 37, x =263
2
N
91
NOUN
EDU 701
STATISTICAL METHODS I
= 2.4094605 = 2.409
Activity 9.5
X 15 11 9 7 5 3 1
F 1 1 2 2 1 2 1
4.0 Conclusion
In this unit you have learnt that apart from the usefulness of the measures of
central tendency for providing a concise index of the average value of a set
of scores, there is more to be studied about a set of scores, variability which
is a universal characteristic of any set of scores is a very important attribute
with which the teacher, the Psychologist, the social scientist, the research
etc. might have to deal. For instance, the measures of achievement,
intelligence, personality and or other characteristics may be expected to show
variability in any sample of individuals.
5.0 Summary
In this unit you have been told that the measures of variability or dispersion
are very necessary for adequately describing quantitative distributions. The
three measures which you have studied in this unit are the mean deviation the
variance and the standard deviations. The mean deviation is the average of
the deviations of the scores or observations from the mean, given that all the
deviations are positive. It uses the modulus in computation. Thus mean
deviation.
NOUN 92
EDU 701
STATISTICAL METHODS I
MD = (X X)
N
or
N N fx ( fx)
N
v. The standard deviation which is the most common of all the measures of
variability is the square root of the variance. It belongs to the interval
scale of measurement and is given by (X X)2 or X)2
N
N
2 2
fx ( fx)
N
N
Using the data below, find the variance and standard deviation.
X 25 24 23 22 21 20 19 18 17 16
F 1 2 3 6 11 16 7 9 8 2
7.0 References:
NOUN 93
EDU 701
STATISTICAL METHODS I
UNIT 10
1.0 Introduction:
Earlier, you learnt how to plot graphs like the frequency curve or cumulative
frequency curve. You have noticed that different curves have different
shapes. One of mans most interesting discoveries was the determination of
a relationship between measurements of many types of natural phenomena
and the mathematical laws of chance if most distributions of many natural
events are plotted on a frequency curve, the shape will be like a bell. This is
called normal curve. In other words, measurement observed in physical and
psychological phenomena will produce a normal curve. For instance, if the
heights of randomly selected people in a community are taken and plotted on
a graph paper, it will give a normal curve. But apart from the normal curve,
some other shapes which are not normal may be observed in some cases. In
this unit we shall look at the different shapes of curves that can be observed.
2.0 Objectives:
Any symmetrical bell- shaped type of curve is known as a normal curve. The
concept of normal curve is very basic in statistics. This is because; the
frequency distributions of many natural events have shapes similar to that of
a normal curve. In other words many physical and psychological phenomena
when shown in a frequency distribution curve will resemble the normal
curve. Take for instance, the weights of the girls in a school, the heights of
men in a church, the achievement scores of students in a class etc. if you get
these measurements and plot them on graphs. They will be similar to the
normal curve.
Activity 10.1
Get the scores of the students in a class in one subject. Use the scores to plot
a graph of frequency against the scores.
NOUN 94
EDU 701
STATISTICAL METHODS I
You have seen that the shape of the curve is bell shaped? Similarly, the
distributions of scores on many psychological tests, such as 1Q tests and tests
of school achievement, are approximated by the normal curved
Fig.10.1 Scores
The Normal curve.
Although the fit may not be perfect, but you will see that the distribution of
scores closely approximates the normal curve.
y= 1 - (X-)2
2 l 22
NOUN 95
EDU 701
STATISTICAL METHODS I
A
B
From the figure above, you will see that the three curves are normal
curves, yet they are different in shape and appearance. Curve A is narrow
and the ordinate is relatively long. Curve C is wide and the ordinate is
relatively short. B is not too narrow nor too wide.
Activity 10.2.
Collect the fooling data:-
i. The result of a class of students examination in any one subject of your
choice.
ii. Collect the weights of the same students in the class.
iii. Collect the heights of the same students in the class.
Plot three graphs with the data on the same graph. Take note of the shapes
for comparison.
3.2 The Properties of a Normal Curve.
i. A normal curve is symmetrical with its maximum height at the mean. It is
often described as a bell-shaped curve while some describe it as a well-
weathered manure pile.
ii. The mean, median and mode fall at the same point.
iii. The height of the curve decreases as one moves to the left and right of the
Point of maximum height.
NOUN 96
EDU 701
STATISTICAL METHODS I
iv. Although the height of the curve continues to decrease as one moves
farther and farther from the mean, it never actually reaches zero.
Therefore the theoretical range of the normal curve is from plus infinity
(+) to minus infinity (- ).
3.3 Skewness
You remember that the normal curve is symmetrical but many distributions
produce curves that are not symmetrical but asymmetrical. These curves lean
or bend either to the left or to the right. Such curves are said to be skewed.
Skewness, therefore is the degree or extent to which a frequency curve is
asymmetric. There are two main types of skewness.
When a curve leans to the left from the observers view point and the tail
extends out towards to the right, it is said to be positively skewed. On
the
other hand, if the curve leans to the right and the tail extends outwards to the
left, it is called negatively skewed.
Now, take a look at the diagrams below. Diagram A is a normal curve. B is
positively skewed and C is negatively skewed. From these figures we can say
that:
A
IX
X
<X
C
^ ~ ^
X X X X X X
i. The mode, the mean and the median have the same value in a normal curve.
ii. The median and the mean lie to the right of the mode in the same direction of
the
skewness in the positively skewed curve.
iii. In the negatively skewed curve the median and the mean lie to the left of the
mode. In other words the mean and the median are less than the mode.
NOUN 97
EDU 701
STATISTICAL METHODS I
You will note again that in a normal curve the mean, the mode and the median do
not differ. But they differ in a skewed distribution. This difference or a function
of it may be taken as a measure of skewness. There are some measures of
skewness. They include: the Pearsons first and second coefficients of skewness
which are:-
^
i. Skewness = mean-mode X-X
Standard deviation = S
~
ii. Skewness = 3(mean median) 3(X X) others are
Standard deviation S
Example 10.1
NOUN 98
EDU 701
STATISTICAL METHODS I
Steps to follow.
F
= 3114.99 31.1499
=
100 = 5.5812095 = 5.58
15)3
iv. Find the mode. From the table it is 18 (i.e. 16.5 + ( 15+15)
N
v Find the median = L+ )i
(2 cfb = 17.5+ (50-39)3
fw 30
= 17.5+ 1.10 = 18.60
vi. ^ = 17.01-18
Coefficient of skewness = X-X
S 5.58
= -099 = -0.177
5.58
^
OR 3(X-X) = 3(17.01-18.60)
S 5.53
-477
5.53 = - 0.863
NOUN 99
EDU 701
STATISTICAL METHODS I
From the result using the Pearsons coefficient of skewness, you can
see that the curve is negatively skewed.
Activity. 10.3
Find the coefficient of skewness in the distribution below.
Class 5-7 8-10 11-13 14-16 17-19 20-22
Freq 8 48 26 16 12 8
3.4 Kurtosis
So far you have seen that normal curves are symmetrical and can be used as
the basis for certain comparisons in handling curves. You have also seen
that curves which are not normal may by skewed either to the left or to the
right. There is yet another characteristic of the form of curves. This is
called kurtosis. The word kurtosis is derived from a Greek word kyrtos,
which means curved. Kurtosis therefore describes the peakness or flatness
of a curve around the mode in a distribution of scores. The three types of
kurtosis are:
i. Platykurtic (platy means flat in Greek.) This has a broad, relatively flat
appearance. It is relatively flat topped.
i. ii
NOUN 100
EDU 701
STATISTICAL METHODS I
Iii iv
Example 10.2
Class 26-28 23-25 20-22 17-19 14-16 11-13 8-10 5-7 2-4
Freq 6 10 15 30 15 10 7 5 2
iv
NOUN
101
EDU 701
STATISTICAL METHODS I
Steps to follow:
i. Complete the composite table.
I = N = 100 = 25
ii. Find Q = L + N
4 cfb 4 4
fw
iii. Find Q1 = L+ 2524 = 13.5 + (1/ x 3/ )= 13.5+0.2
3
15 1
15
13.7
iv. Find Q3 =L+ 7569 = 19.5+ (6/15 x 3 ) = 19.5+1.2
3
15
= 20.7
v. Find P = L+ ( N/100- Cfb)I = N/100 = 100
fw 100 = 1
vi Find P90 = 90
L+ 84 = 22.5 + (6/10x 3/1)
3
10
= 22.5+1.8 = 24. 3
10
7 = 7.5+(3/7x 3/1) = 7.5+1.29
3
vii. Find P = L+ 7
= 8.79
Q 3 Q 1 7
viii. Find K = 1 = 20.7 - 1.37 3.5
=
2 90 P 10
24.3 8.79 2
p
15.51 =
15.51
=
0.2256608 = 0.23
NOUN 102
EDU 701
STATISTICAL METHODS I
Activity 10.4
Find the index of kurtosis in the distribution below
Class 7-5 8-10 11-13 14-16 17-19 20-22
Freq 8 48 26 16 12 8
4.0 Conclusion
You have gone through the measures of central tendency. You have seen that
when a set of data is appreciably or greatly skewed the median is better than
the mean. In this unit you have learnt how to find out the degree to which a
set of data is skewed, and to categorize measures of skewness. You have also
seen how to find out spread or bunched up of a set of scores which is
technically referred to as kurtosis. As a teacher or researcher or even social
scientist, you are often confronted with large masses of data, usually scores
of some type which require interpretation, if they are to be useful you have to
do summarizing of the data by using the graphical presentation. This will
show you at a glance the degree of skewness and kurtosis, or when the curve
produced is a normal curve.
5.0 Summary
In this unit, you have learnt that the normal curve is a frequency curve of
a
theoretical distribution which is unimodal and symmetrical with the mean,
median and the mode at the same point. The weight is greater at this
point
and decreases on both sides of this point to form a bell-shaped curve. The
theoretical ranges of the normal curve are from - to + but most of the area
lies between +3 and -3. you have now known that one of the most
immediately obvious characteristics of the form of a graphed frequency
distribution is its symmetry or lack of symmetry or balance. A curve is
symmetrical in shape if one side is a mirror image of the other. But when
one side is not a mirror image of the other, it is asymmetrical and this
is
characterized by a high point or lump that is off-centre and by tails of
distinctly unequal length. This is called a skewed curve. The lump indicates
the scores with the highest frequencies. Skewness can be positively or
negatively.
103
EDU 701
STATISTICAL METHODS I
8.0 References
Ary, Donald and Jacobs, L.C (1976). Introduction to statistics: purposes and
procedures. New York Chicago Sydney, Montreal . Holt
Rinehart and Winston.
NOUN 104
EDU 701
STATISTICAL METHODS I
UNIT 11
1.0 Introduction
So, far we have focused on these statistical procedures used for describing
single variables or for analyzing what we may call univariate distributions.
You have learnt how to compute the measures of central tendency and
variability, but these always come from one variable, such as test scores, etc.
but we need statistical methods that can be used to investigate relationship
that may exist between two variables in a population or samples.
2.0 Objectives
Some of the times, we are faced with such questions as: Is there a relationship
between students achievement in mathematics and their achievement in the
sciences? Does socioeconomic status affect school achievement? Is there any
relationship between aptitude in Engineering and performance in engineering
courses? What is the relationship between time used in Mathematics drills
and students achievement in mathematics? Is there any relationship between
the scores of candidates in Common Entrance Examination and their scores
in JSCE etc. These questions can be answered using a statistical procedure or
technique called correlation? In other words correlational methods are used
NOUN 105
EDU 701
STATISTICAL METHODS I
You will have to note again that if two sets of scores do not have a common
source we cannot employ correlation. In other words there must be logical
bases for pairing the variables before correlation can be employed.
Activity 11.1
Scatter diagrams or simply scatter grams were used before calculators and
even computers, were as available as they are now. They were also used
when samples to be correlated were large, or even moderate in size. The
common procedure was to group data in both X and Y and to prepare a
scatter gram or correlation diagram to provide some shortcuts in calculation.
It is also a way to show correlation visually. A scatter gram therefore is a
graph in which a single dot is used to locate each individual on two
dimensions. The pattern formed by the dots show the correlation.
NOUN 106
EDU 701
STATISTICAL METHODS I
To construct a scatter gram we first lay out the scale for one variable on the
abscissa and the scale for the other on the ordinate. By now you are very
familiar with the construction of graphs having constructed many of them
earlier in this course. But note that by convention the variable on the abscissa
is labeled X and the variable on the ordinate is labeled Y.
(a) (b)
Y Y
X
X
Y (c) Y (d)
X
XX X XX
XXX XXX XXX
X X
X
X XXXXXX X
X X X X
X X X X
X X X X XX X X X
X X X
X XX X X XX X X
XX X X X X XX X
XXXXXXX
X
X
(e) (f)
X X
NOUN
107
EDU 701
STATISTICAL METHODS I
You will have to note that in reading or interpreting the scatter gram, you
must bear in mind that when high scores on one variable are associated with
high scores on a second variable and low scores on the one associated with
low scores on the other, the variables are said to be positively correlated, or
to show positive correlation. Fig 11.1 above gives you the ideas of the shapes
of some scatter diagrams. From the figures: (a) shows high positive
relationship. (b) Moderate positive relationship and (c) shows no or zero
relationship. (e) Shows moderately negative relationship while (d) shows
high negative relationship
You will also take note that in a positive correlation the dote or x marks in
the scatter gram spread from lower left to upper right, while in a negative
correlation the marks spread from upper left to lower right. Note also that in
some cases, the variables show no tendency to vary or change. In other
words, some individuals scoring high on one variable and scoring neither
systematically high or low on the other variable. The result is that there is no
or zero correlation between the two variables. The marks are spread at
random on the scatter gram.
Activity 11.2
You have learnt how to plot graphs, so talking about X and Y axes may not
be new to you. We are going to use the same method to set up a two-
way
grouping of data having a table prepared in columns and rows. A bivariate
frequency distribution is another way to show a correlation visually. Values
of the X variable are shown on the abscissa.
While values of the Y variable are shown on the ordinate. In other
words,
there are columns for the dispersions of Y scores within each score or class
interval for the X scale, and rows for the dispersions of x scores within each
of the intervals for the Y scale. A long the top of the table are listed the score
limits for the class intervals for X scores. A long the left hand margin are
listed the score limits for the intervals of Y scores. A tally mark shows each
NOUN
108
EDU 701
STATISTICAL METHODS I
Example 11.1
Construct a bivariate frequency distributions of scores in two tests of
students given below.
Class 60-69 70-79 80-89 90-99 100-109 110- 120- 130-139 140-
B Int 119 129 149
F 5 10 10 14 10 8 10 6 5
Steps to follow:
i. set up the two way table as shown below to show the number of
rows
and columns required.
ii. Fix the classes as follows test A on the right and test B on top of the table.
iii. Fix the frequencies on the opposite sides
iv Match the scores and tally as shown
Activity 11.3
Construct a bivariate frequency distributions of the test below and give your
interpretation.
NOUN
109
EDU 701
STATISTICAL METHODS I
Test A
Class Int 1-5 6-10 11-15 16-20 21-25 26-30 31-35 36-40
F 2 3 5 8 10 7 4 1
Test B
Class 2-4 5-7 8-10 11-13 14-16 17-19 20-22 23-25 26-28 29-31 32-34
Int
F 1 3 3 3 4 5 8 4 5 2 1
You have seen that mere inspection of a scatter gram furnishes you with
some general information on the relationship between two sets of measures
for a given group. It can give you idea on the type and direction of
relationship, but it does not give you the degree or extent of relationship.
Therefore a numerical index indicating precisely the degree of relationship is
much more helpful and highly required.
Steps to follow:
NOUN 110
EDU 701
STATISTICAL METHODS I
v. Find the deviations of X and Y scores and complete the composite table.
xy 587
vi. Applying the formular we have 261 =
y 2 270x
587 4.9
706033.8 4
Compute the Pearson r for the same sets of data using the raw score method
Steps to follow:
NOUN 111
EDU 701
STATISTICAL METHODS I
2 2
S/No X Y XY X Y
1 9 23 207 81 529
2 13 40 520 169 1600
3 6 10 60 36 100
4 18 48 864 324 2304
5 14 25 350 196 625
6 12 30 360 144 900
7 11 15 165 121 225
8 7 10 70 49 100
9 2 5 10 4 25
10 6 45 270 36 2025
11 14 40 560 196 1600
12 15 35 525 225 1225
13 5 12 60 25 144
14 8 27 216 64 729
140 365 4237 1670 12131
)(
v. Apply the forrmular r= ( x y)
n xy
[ )
( x 2]
)
( y 2]
2
2
n x n y
( x )
r = 14x4237 140 365
(
2 )(
2 )
14x1670 140 14 x12131 365
8218 8218
59318 51100 =
=( =
)
13642020 11763.589
2 )( 2
14x1670 140 14x12131 365
= 0.6985963 = 0.70
Activity 11.4
Using both the deviation and raw score methods calculate the correlation
coefficient of the scores below.
Maths 28 46 11 34 9 43 21 30 17 25 40 5 48 32 22 16 44 14 37
Phy. 30 33 22 38 7 40 18 26 15 24 36 9 44 30 20 12 46 27 21
NOUN 112
EDU 701
STATISTICAL METHODS I
Before you finish this unit, you should be informed about some restrictions
that should be observed in the use of the Pearson product moment coefficient
of correlation. You have been told that the Pearson r is a meaningful index of
the relationship between two variables but the data must meet certain
underlying assumptions.
Now that you have seen the assumptions underlying the Pearson r, let us
move further to look at the factor influencing the correlation coefficient.
Whenever you are interpreting a correlation coefficient you should always
consider the nature of the population in which the two variables were
observed. The correlation coefficient observed between two variables will
vary from one population to another because:
4.0 Conclusion:
In this unit you have learnt some of the measures of association. You have
leant that these measures seek to find out the tendency of scores of two
variables to change together either in the same direction or in the opposite
direction. This is made possible because of the understanding that scientific
progress depends upon finding out the relationship between things.
NOUN 113
EDU 701
STATISTICAL METHODS I
Correlation is one of the best statistical methods used for finding out these
relationships especially in psychology, Education and the behavioural
sciences. In the physical sciences such as Biology, Chemistry or Physics, we
have measures which give perfect relationship. e.g. The longer a piece of
metal the heavier, therefore the volume of the metal. But the height of a man
can not be in perfect relationship with the weight of the man. So in the social
or behavioural sciences we rarely have perfect association..
5.0 Summary
In this unit you have gone through the concept of correlation which refers to
the extent to which two variables are related in a population or sample. You
have seen that correlation can be illustrated graphically using the scatter gram
or the bivariate frequency distribution where the values of the two variables
X and Y for each member are plotted as points or dots or marked X.
Mere
inspection of the construction will show the type and direction of the
association. If the plotted points run from lower left to upper right, it
indicates positive relationship but if the points run from upper left to lower
right, it is negative correlation, and if the points are scattered in a random
fashion all over the graph, It is indicative of zero correlation between the
variables. When the plotted points are close to a straight line, it shows perfect
correlation. When the points are removed from a straight line, the degree of
relationship is less. But to get the index of the degree of relationship, we
apply the coefficient of correlation which is a numerical index and the most
commonly used is the Pearson product moment correlation coefficient which
can use the deviation method or the raw score method.
6.0
Tutor Marked Assignment
Use any of the pearsons method to find the correlation coefficient of the two
sets of data below.
X 6 7 7 8 8 9 10 11 12 12 13 13 15 16 18 19 20 21 21
Y 20 16 18 17 17 16 15 13 14 13 10 14 10 9 8 6 7 4 6
7.0 References:
Ary, Donald and Jacobs, L.C (1976). Introduction to statistic: purposes and
procedures. New York Chicago Sydney, Montreal . Holt Rinehart
and Winston.
NOUN
114
EDU 701
STATISTICAL METHODS I
NOUN 115
EDU 701
STATISTICAL METHODS I
Units 12
1.0 Introduction
In the last unit you went through the Pearson product- moment correlation
coefficient, which we described as the best known and the most frequently
used index of relationship. But there are data or situations to which the
Pearson r. cannot be applied , and there are instances in which it can be
applied, but in which for practical purposes, other procedures are more
expedient. The Pearson r. is most defensibly computed when the two
variables X and Y are measured on continuous metric scales and the
regressions are linear. Many data are in frequencies or are in nominal scales,
in this case the Pearson r. cannot be applied. These and other situations such
as if X or Y variable are measured:
2.0 Objectives
You have learnt that various types of values or data are suitable for different
correlational computations. In other words, you are aware that correlation
coefficients and the computational techniques or models for obtaining them
do vary. These variations depend on the type of values assigned to or
taken
by the variables being correlated. The type of values taken by or assigned to
variables that may be correlated include:
NOUN 116
EDU 701
STATISTICAL METHODS I
Diagrammatically, Pairs of these variables and the type of correlation coefficient that
can be used are shown below.
Types of values i. Continuous ii. Ranks iii Naturally iv. Artificially v. Three or
a variable takes or raw scores dichotomized dichotomized more
categories
i. Continuous/ Pearson r. Point biserial Biserial rbi
raw scores rpbi
ii. Ranks Spearman
rho
iii Naturally Point biserial Phi
dichotomized rpbi coefficient
iv. Artificially Biserial rbi Tetrachloric
dichotomized coefficient
r.tet
v. Three or Contingency
more categories coefficient c
In the last unit you were told that the Pearson product moment correlation
coefficient is the most widely used and that the others are adaptations of
it.
This is true with the spearman- Brown Rank- order correlation coefficient
which was developed first by a British psychologist Charles spearman and
made popular by both spearman and Brown. It is denoted by the Greek letter
rho . When data from both of the two variables to be correlated are
measured on an ordinal scale or rank order scale, the spearman rank
coefficient is the technique generally applied.
As a teacher, there are so many situations in which you may have data from
ordinal scale for correlations. Such situations may arise when there are
questions concerning the relationship between variables on which students
can be ranked e.g. questions on interest, sociability, cooperativeness
socioeconomic status, ability, attitudes towards issues, adjustments,
performance in class among others. In this case the spearman Rank order is
used.
NOUN
117
EDU 701
STATISTICAL METHODS I
You have learnt that the spearman rank is designed for ranked data, it can
also be used with interval data that have been expressed as ranks. In this case
it is an alternative to the Pearson r, especially when the data are not large i.e.
not more than 30.
We have said that the spearman rank correlation makes use of ranks. The use
of ranks instead of the original raw scores results in a marked simplification
in the formular for the correlation coefficient. The formular is given by
6 d2 d2
rho .= 6
1
( ) 1 or 1( 1)
n+ 1)( )(
n n
Where d = difference in subjects rank on the two measures squared
2
Example 12.1
The scores of 10 students in two subjects physics and Technical Drawing are
given below. Compute the correlation coefficient using the spearman rho.
Phy 45 50 80 68 10 42 65 50 25 70
T.D 60 90 60 72 30 88 70 60 40 75
Steps to follow:
i. Set up a composite table as shown below.
Scores in Ranks in
S/No Phy T.D Phy T.D D D2
1 45 60 7 7 0 0
2 50 90 5.5 1 4.5 20.25
3 80 60 1 7 -6 36
4 68 72 3 4 -1 1
5 10 30 10 10 0 0
6 42 88 8 2 6 36
7 65 70 4 5 -1 1
8 50 60 5.5 7 -1.5 2.25
9 25 40 9 9 0 0
10 70 75 2 3 -1 1
97.50
NOUN 118
EDU 701
STATISTICAL METHODS I
= 0.41
Activity 12.1
Calculate the spearman- Brown correlation coefficient of the scores of some
students in two subjects X and Y give below.
X 47 71 52 48 35 35 41 82 72 56 59 73 60 55 41
Y 75 79 85 50 49 59 75 91 100 87 70 92 54 75 68
So far, you have gone through correlation coefficients which make use of two
variables that are measured on continuous scales. But some of the times you
may be confronted with a situation where you have to deal with one
continuous variable measured on an interval or ratio scale and the other
variable is dichotomous. A genuine or naturally dichotomized variable has
only two possible values such as male- female; graduate non graduate
married- unmarried, urban-rural, smoker- non smoker, good-bad, old-young,
fat- thin, long-short etc. These are measured on a nominal scale. Therefore,
when you have a continuous variable such as school achievement test versus
a naturally dichotomized variable, the correlation coefficient to use is the
point- biserial correlation coefficient.
NOUN 119
EDU 701
STATISTICAL METHODS I
Example 12.2
Calculate the point-biserial correlation coefficient from a continuous and a
genuine dichotomous variable given below.
Individual 1 2 3 4 5 6 7 8 9 10 11 12 12 14
Achievement 60 56 51 58 49 48 55 45 47 55 45 50 52 61
Test
Sex: MIFO 1 1 1 0 1 0 0 0 1 0 1 1 1 1
Steps to follow:
NOUN 120
EDU 701
STATISTICAL METHODS I
2
X X-X (X-X)
60 7.7 59.29
56 3.7 13.69
51 -1.3 1.69
58 5.7 32.49
49 -.3.3 10.89
48 -4.3 18.49
55 2.7 7.29
45 -7.3 53.29
47 -5.3 28.09
55 2.7 7.29
45 -7.3 53.29
50 -2.3 5.29
52 -0.3 0.09
61 8.7 7.5.69
732 366.86
X 52.3
Note that the point- biserial correlation coefficient like all other types of
correlation coefficients has a theoretical range of +1 to 1. But the size of the
coefficient here is dependent upon the proportions P and q in the two
categories of the dichotomous variable.
NOUN 121
EDU 701
STATISTICAL METHODS I
The rpbi can reach 1 when p and q are 0.50, that is p= q. if the proportions
differ from p and q = 0.50 it is mathematically impossible for the rpbi to
reach 1 other measures of association or coorelations coefficients will be
discussed later.
Activity 12.2
The scores of boys and girls in an integrated science performance test for 20
students are given in a table below. Compute the point biserial correlation
coefficient.
S/No 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Score 70 53 37 68 45 56 38 32 61 48 40 60 30 52 49 63 55 44 62 47
Sex A G B G B G B B G G B B B G B B G G B B
4.0 Conclusion
5.0 Summary
In this unit you have examined two other correlation coefficients which have
been developed for use with particular types of data. These correlation
coefficients are derivations of the person product moment correlation
coefficient which is the most widely used. The spearman- Brown rank order
correlation coefficient is designed for use with naturally ordinal data or with
interval data which have been expressed as ranks. The point-biserial
correlation coefficient is designed for the correlation when the data are
measured from one continuous variable and one naturally dichotomous
variable. It assumes that the dichotomous variable is genuine.
NOUN 122
EDU 701
STATISTICAL METHODS I
S/No 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
X 12 19 2 17 11 18 6 15 10 3 14 1 9 16 13 4 8 20 7 5
Y 35 30 35 25 23 22 40 25 37 33 30 40 27 33 37 32 33 28 31 29
7.0 References:
Ary, Donald and Jacobs, L.C (1976). Introduction to statistic: purposes and
procedures. New York Chicago Sydney, Montreal . Holt
Rinehart and Winston.
NOUN 123
EDU 701
STATISTICAL METHODS I
Unit 13
STANDARD SCORES
1.0 Introduction
2.0 Objectives
You have seen that comparing students using the raw scores is not correct.
The standard scores are used very effectively in bringing the students to the
same scale and therefore for comparing students performances. The
conversion to standard score takes into account some important
considerations. It is used to overcome the anomally involved in ranking of
raw scores. Because there are differences in the strength of the question
papers, and because of the different difficulty levels of the different items on
the different question papers, the standardization of the scores is very
necessary.
NOUN 124
EDU 701
STATISTICAL METHODS I
i. They are a direct transformation of raw scores and therefore reflect the
magnitude of a score
ii. Since they are interval measures they can be used in a wide rarity of
mathematical computations
This is the basic standard score. It is a type of standard score norm in which
both the raw scores, the mean and the standard deviation are considered in
the process. It is a standard score in which a deviation form the mean is
expressed in terms of the deviation of a raw score from the mean divided by
the standard deviation. It is symbolized by the small letter z, while the
formula is:
Note that the mean is the reference point and the standard deviation is the
basic unit for measuring distance from that point. It gives a mean of 0 and a
standard deviation of I.
Example 13.1
Given the scores of 10 students in a test as follows 40, 50, 80, 60, 30, 25, 90,
75, 40, 60. Transform the scores to z-scores.
Steps to follow.
NOUN
125
EDU 701
STATISTICAL METHODS I
X X X-X (X-X)2
1 40 -15 225
2 50 -5 25
3 80 25 625
4 60 5 25
5 30 -25 625
6 25 -30 900
7 90 35 1225
8 75 20 400
9 40 -15 225
10 60 5 25
550 4300
X 55.0
S= 4300 = 20.74
10
XX
Using z= S , we have
1. z40 = 40 55 = 15 = - 0.723
20.74 20.74
1
2. z50 = 50 55 = -5 5 = -0.241
20.74 20.74
3. z80 = 80 55 = 25 = 1.205
20.74 20.74
4. z60 = 60 55 = 5 = 0.241
20.74 20.74
5. z30 = 30 55 = 25 = 1.205
20.74 20.74
6. z25 = 25 55 = 30 = -1.446
20.74 20.74
NOUN 126
EDU 701
STATISTICAL METHODS I
7. z90 = 90 55 = 35 = 1.688
20.74 20.74
8. z75 = 75 55 = 20 = 0.964
20.74 20.74
9. z40 = 40 55 = 15 = -0.723
20.74 20.74
Activity 13. 1
Calculate the Z-scores of the following set of scores: 60, 90, 50, 72, 30, 88,
70, 65, 40, and 75.
3.3 T- Score.
From the example above, you have seen that a z- score of 1.205 means that
the raw score has a spread or is located 1.205 standard deviations above the
mean. In the same way, z score of -1.446 indicates that the score is located
1.446 standard deviations below the mean. You have also noticed that the z-
scores are very low and sometimes have negative scores and decimalized
scores. A z-score distribution can therefore be transformed to a new
distribution where there are no decimal points and negative values. The
decimal point is eliminated by multiplying the z- score by some convenient
constant, while the minus sign is eliminated by adding another constant to
each z-score. One of the most popular and convenient transformations is to
convert the z-score to a distribution which has a mean of 50 and a
standard
deviation of 10. it is called T-score or Z-score. It is given by the formula, T=
X
10z +50 or T = 10 S X + 50
Example 13.2
Transform the raw scores below to T-scores. 45, 50, 80, 68, 10, 42, 65, 50,
25, 70.
Steps to follow
NOUN 127
EDU 701
STATISTICAL METHODS I
X X X-X (X-X)2
1 45 -5.5 30.25
2 50 -0.5 0.25
3 80 29.5 870.25
4 68 17.5 706.25
5 10 -40 1640.25
6 42 -8.5 72.25
7 65 14 210
8 50 -0.5 0.25
9 25 -25.5 650.25
10 70 19.5 380.25
550 4560.50
X 55.0
X
Using T = S X + 50 we have
10 5
5.5
45 0.
5
1. T45 = 10 + 50 =
21.3 10 21. + 50 = 47.43
6 36
50
50 0.5
.5
2. T50 =10 21.3 + 50 = 10 21. + 50 = 49.77
6 36
80
3. T80 = 10 + 50 = 10 + 50 = 63. 58.
50 29
.5
21.36 21.36
T68 = 10 68
4. 50 + 50 = 10 17.5 + 50 = 58.19
.5
21.3 21.
10 6 36
5. T10 = 5 + 50 = + 50 = 31.04
40.5
10 0. 10
5
O
21.36
6. T42 = 10 42 50.5
U
21.36
N
N
21.36
+ 50 = 10 8.5
+ 50
= 46.02
21.36
128
EDU 701
STATISTICAL METHODS I
65 50
7. T65 = 10 + 50 = 10 14.5 + 50 = 56.79
.5
21.36 21.36
50 50 0.5 + 50 = 49.77
8. T50 = 10 + 50 =
.5 10
21.36 21.36
25 5 25.5
9. T25 = 10 + 50 = + 50 = 38.06
0. 10
5
21.36 21.36
70 5
10. T70 = 10 + 50 = 10 19.5 + 50 = 59.13
0.
5
21.36 21.36
Activity 13.2
NOUN
3.4
Council, NECO, National Business and Technical Education Board
Calculate the T-scores NABTEB, among others, use a type of transformation of raw scores called
for the data below. stanine scores. These are a standard score system which provides a single
50, 13, 80, 45, 60, 65, digit score scale running from 1 to 9. in other words stanine are number
grades ranging from 1 to 9 and the percentage of cases in the stanine are
70, 22,
4,7,12,17,20,17,12,7,4 respectively. This gives a normal distribution where
38,55, 18, 80, 75, 84, the highest score corresponds to stanine of 9 and the least to a stanine score
42, of 1. The mean is assigned a value of 5 and the standard deviation a value of
2. This system was originally used by the Air force during the World War II.
Stanine Scores The stamina simply means scores with ine categories, 1 to 9. the lowest
stamina I represents a score that is 2 or more standard deviations below
You are familiar the
with some of the mean while the highest stanine 9 represents a score that is 2 or more standard
examination bodies deviations above the mean. But WAEC uses a reversed type of stanine. For
that operate at WAEC, stamina of 1 represents the lowest score. Thus A1 = Stanine 1, B2=
the stamine2, B3= stanine 3, C4= stanine 4. C5= Stanine 5, C6= stanine 6, E7=
ordinary level in stamina 7. P8 = stanine 8 and F9= stanine 9. This system has a very
this country- good
Nigeria. Some of advantage because it can be used to compare results from a very wide
these examination population whose characteristics may not be the same.
bodies
like the West
African Examination
Council WAEC,
National
Examinations 129
EDU 701
STATISTICAL METHODS I
Activity 13.3
Make a collection of Examination bodies you know and examine how they
treat their raw scores. Compare them with that of WAEC.
4.0 Conclusion
In this unit, you have seen that it is wrong to compare students with the use
of raw scores. You have therefore been exposed to some of the most popular
standard scores which you can use at any time in comparing your students.
As a teacher, you have to use them most of the times. You need therefore to
get used to them now.
5.0 Summary
In this unit you have learnt that the standard scores are a direct transformation
of raw scores. Differences in standard scores have the same meaning in any
part of a distribution. You have also noted that standard scores tell us the
number of standard deviations above or below the mean a score is located.
The z-score is the basic standard score with a mean of 0 and standard
deviation of I the characteristics are:
The T-score is a conversion of the z-score which eliminates both the negative
values and the decimal points.
Calculate the z-score and T-score of the data below. 32, 15, 28, 23, 18,
27,
29, 25, 21, 35.
NOUN 130
EDU 701
STATISTICAL METHODS I
7.0 References:
Aryl, Donald and Jacobs, L.C (1976). Introduction to statistic: purposes and
procedures. New York Chicago Sydney, Montreal. Holt
Rinehart and Winston.
NOUN 131