Quantitative Tech- intro
Quantitative Tech- intro
"The collection, presentation, analysis and interpretation of the numerical data." This definition
clearly points out four stages in a statistical investigation, namely:
a) To present the data in a concise and definite form: Statistics helps in classifying and tabulating
raw data for processing and further tabulation for end users.
b) To make it easy to understand complex and large data: This is done by presenting the data in
the form of tables, graphs, diagrams etc., or by condensing the data with the help of means,
dispersion etc.
c) For comparison: Tables, measures of means and dispersion can help in comparing different
sets of data.
d) In forming policies: It helps in forming policies like a production schedule, based on the
relevant sales figures. It is used in forecasting future demands.
f) In measuring the magnitude of a phenomenon: - Statistics has made it possible to count the
population of a country, the industrial growth, the agricultural growth, the educational level (of
course in numbers)
1. Statistics does not deal with individual measurements. Since statistics deals with aggregates of
facts, it cannot be used to study the changes that have taken place in individual cases. For
example, the wages earned by a single industry worker at any time, taken by itself is not a
statistical datum. But the wages of workers of that industry can be used statistically. Similarly,
the marks obtained by Kamau of your class or the height of Atieno (also of your class) are not
the subject matter of statistical study. But the average marks or the average height of your class
has statistical relevance.
2. Statistics cannot be used to study qualitative phenomenon like morality, intelligence, beauty
etc. as these cannot be quantified. However, it may be possible to analyze such problems
statistically by expressing them numerically. For example, we may study the intelligence of boys
on the basis of the marks obtained by them in an examination.
3. Statistical results are true only on an average: - The conclusions obtained statistically are not
universal truths. They are true only under certain conditions. This is because statistics as a
science is less exact as compared to the natural science.
4. Statistical data, being approximations, are mathematically incorrect. Therefore, they can be
used only if mathematical accuracy is not needed.
5. Statistics, being dependent on figures, can be manipulated and therefore can be used only
when the authenticity of the figures has been proved beyond doubt.
It is often said by people that, "statistics can prove anything." There are three types of lies - lies,
dump lies and statistics - wicked in the order of their naming. A Paris banker said, "Statistics is
Thus by "distrust of statistics" we mean lack of confidence in statistical statements and methods.
3. The wrong representation of even correct figures can mislead a reader. For example, John
earned Ksh 400,000 in 1990 - 1991 and Jane earned Ksh 500,000. Reading this one would form
the opinion that Jane is decidedly a better worker than John. However, if we carefully examine
the statement, we might reach a different conclusion as Jane’s earning period is unknown to us.
Thus, while working with statistics, one should not only avoid outright falsehoods but be alert to
detect possible distortion of the truth.
Broadly speaking, statistics may be divided into two categories, ie descriptive and inferential
statistics.
When analyzing data, for example, the marks achieved by 100 students for a piece of
coursework, it is possible to use both descriptive and inferential statistics in your analysis of their
marks. Typically, in most research conducted on groups of people; you will use both descriptive
and inferential statistics to analyze the results and draw conclusions. So, what are descriptive and
inferential statistics? And what are their differences?
Descriptive statistics is the term given to the analysis of data that helps describe, show or
summarize data in a meaningful way such that, for example, patterns might emerge from the
data. Descriptive statistics do not, however, allow us to make conclusions beyond the data we
have analyzed or reach conclusions regarding any hypotheses we might have made. They are
simply a way to describe our data.
Descriptive statistics are very important, because if we simply presented our raw data, it would
be hard to visualize what the data was showing, especially if there was a lot of it. Descriptive
statistics therefore, allow us to present the data in a more meaningful way which allows simpler
interpretation of the data. For example, if we had the results of 100 pieces of students'
coursework, we may be interested in the overall performance of those students. We would also
be interested in the distribution or spread of the marks. Descriptive statistics allow us to do this.
Typically, there are two general types of statistic that are used to describe data:
Measures of central tendency: these are ways of describing the central position of a frequency
distribution for a group of data. In this case, the frequency distribution is simply the distribution
and pattern of marks scored by the 100 students from the lowest to the highest. We can describe
this central position using a number of statistics, including the mode, median, and mean.
Measures of spread: these are ways of summarizing a group of data by describing how spread
out the scores are. For example, the mean score of our 100 students may be 65 out of 100.
However, not all students will have scored 65 marks. Rather, their scores will be spread out.
Some will be lower and others higher. Measures of spread help us to summarize how spread out
these scores are. To describe this spread, a number of statistics are available to us, including the
range, quartiles, absolute deviation, variance and standard deviation.
When we use descriptive statistics, it is useful to summarize our group of data using a
combination of tabulated description (i.e., tables), graphical description (i.e., graphs and charts)
and statistical commentary (i.e., a discussion of the results).
Whilst descriptive statistics examine our immediate group of data (for example, the 100 students'
marks), inferential statistics aim to make inferences from this data in order to make conclusions
that go beyond this data. In other words, inferential statistics are used to make inferences about a
population from a sample in order to generalize (make assumptions about this wider population)
and/or make predictions about the future.
For example, a Board of Examiners may want to compare the performance of 1000 students that
completed an examination. Of these, 500 students are girls and 500 students are boys. The 1000
students represent our "population". Whilst we are interested in the performance of all 1000
students, girls and boys, it may be impractical to examine the marks of all of these students
because of the time and cost required to collate all of their marks. Instead, we can choose to
examine a "sample" of these students and then use the results to make generalizations about the
performance of all 1000 students. For the purpose of our example, we may choose a sample size
of 200 students. Since we are looking to compare boys and girls, we may randomly select 100
girls and 100 boys in our sample. We could then use this, for example, to see if there are any
statistically significant differences in the mean mark between boys and girls, even though we
have not measured all 1000 students.
1. Bias: - Bias means prejudice or preference of the investigator, which creeps in consciously and
unconsciously in proving a particular point.
2. Generalization: - Some times on the basis of little data available one could jump to a
conclusion, which leads to erroneous results.
A discrete variable is one that cannot take on all values within the limits of the variable. For
example, responses to a five-point rating scale can only take on the values 1, 2, 3, 4, and 5. The
variable cannot have the value 1.7. A variable such as a person's height can take on any value.
Variables that can take on any value and therefore are not discrete are called continuous.
Statistics from computed discrete variables have many more possible values than the discrete
variables themselves. The mean on a five-point scale could be 3.117 even though 3.117 is not
possible for an individual score.
1.7.2 Continuous Variable
A continuous variable is one for which, within the limits the variable ranges, any value is
possible. For example, the variable "Time to solve an anagram problem" is continuous since it
could take 2 minutes, 2.13 minutes etc. to finish a problem. The variable "Number of correct
answers on a 100 point multiple-choice test" is not a continuous variable since it is not possible
to get 54.12 problems correct. A variable that is not continuous is called "discrete"
Measurements with ordinal scales are ordered in the sense that higher numbers represent higher
values. However, the intervals between the numbers are not necessarily equal. For example, on a
five-point rating scale measuring attitudes toward gun control, the difference between a rating of
2 and a rating of 3 may not represent the same difference as the difference between a rating of 4
and a rating of 5. There is no "true" zero point for ordinal scales since the zero point is chosen
arbitrarily. The lowest point on the rating scale in the example was arbitrarily chosen to be 1. It
could just as well have been 0 or -5.
On interval measurement scales, one unit on the scale represents the same magnitude on the trait
or characteristic being measured across the whole range of the scale. For example, if anxiety
were measured on an interval scale, then a difference between a score of 10 and a score of 11
would represent the same difference in anxiety as would a difference between a score of 50 and a
score of 51. Interval scales do not have a "true" zero point, however, and therefore it is not
possible to make statements about how many times higher one score is than another. For the
anxiety scale, it would not be valid to say that a person with a score of 30 was twice as anxious
as a person with a score of 15. True interval measurement is somewhere between rare and
nonexistent in the behavioral sciences. No interval-level scale of anxiety such as the one
described in the example actually exists. A good example of an interval scale is the Fahrenheit
scale for temperature. Equal differences on this scale represent equal differences in temperature,
but a temperature of 30 degrees is not twice as warm as one of 15 degrees.
1.8.4 Ratio Scale
Ratio scales are like interval scales except they have true zero points. A good example is the
Kelvin scale of temperature. This scale has an absolute zero. Thus, a temperature of 300 Kelvin
is twice as high as a temperature of 150 Kelvin.
ASSIGNMENT 1
1. Define statistics
For any statistical enquiry, the basic objective is to collect facts and figures relating to a
particular phenomenon for further statistical analysis. The process of counting, enumeration or
measurement together with systematic recording of results is called collection of statistical data.
Primary data is data that you collect yourself using such methods as:
direct observation - lets you focus on details of importance to you; lets you see a system in real
rather than theoretical use (other faults are unlikely or trivial in theory but quite real and
annoying in practice);
surveys - written surveys let you collect considerable quantities of detailed data. You have to
either trust the honesty of the people surveyed or build in self-verifying questions (e.g., questions
9 and 24 ask basically the same thing but using different words - different answers may indicate
the surveyed person is being inconsistent, dishonest or inattentive).
interviews - slow, expensive, and they take people away from their regular jobs, but they allow
in-depth questioning and follow-up questions. They also show non-verbal communication such
as face-pulling, fidgeting, shrugging, hand gestures, sarcastic expressions that add further
meaning to spoken words. e.g., "I think it's a GREAT system" could mean vastly different things
depending on whether the person was sneering at the time! A problem with interviews is that
people might say what they think the interviewer wants to hear; they might avoid being honestly
critical in case their jobs or reputation might suffer.
logs (e.g., fault logs, error logs, complaint logs, transaction logs). Good, empirical, objective
data sources (usually, if they are used well). Can yield lots of valuable data about system
performance over time under different conditions.
Primary data can be relied on because you know where it came from and what was done to it. It's
like cooking something yourself. You know what went into it.
There's a lot more secondary data than primary data, and secondary data is a whole lot cheaper
and easier to acquire than primary data. The problem is that often the reliability, accuracy and
integrity of the data is uncertain. Who collected it? Can they be trusted? Did they do any
preprocessing of the data? Is it biased? How old is it? Where was it collected? Can the data be
verified, or does it have to be taken on faith?
Often secondary data has been pre-processed to give totals or averages and the original details
are lost so you can't verify it by replicating the methods used by the original data collectors.
In short, primary data is expensive and difficult to acquire, but it's trustworthy. Secondary data is
cheap and easy to collect, but must be treated with caution.
In primary data collection, you collect the data yourself using methods such as interviews and
questionnaires. The key point here is that the data you collect is unique to you and your research
and, until you publish, no one else has access to it.
There are many methods of collecting primary data and the main methods include:
questionnaires
interviews
focus group interviews
observation
case-studies
diaries
critical incidents
Portfolios.
The primary data, which is generated by the above methods, may be qualitative in nature
(usually in the form of words) or quantitative (usually in the form of numbers or where you can
make counts of words used). We briefly outline these methods but you should also read around
the various methods.
Questionnaires
Questionnaires are a popular means of collecting data, but are difficult to design and often
require many rewrites before an acceptable questionnaire is produced.
Advantages:
Can be used as a method in its own right or as a basis for interviewing or a telephone
Survey.
Can be posted, e-mailed or faxed.
Can cover a large number of people or organizations.
Wide geographic coverage.
Relatively cheap.
No prior arrangements are needed.
Avoids embarrassment on the part of the respondent.
Respondent can consider responses.
Possible anonymity of respondent.
No interviewer bias.
Disadvantages:
Design problems.
Questions have to be relatively simple.
Historically low response rate (although inducements may help).
Time delay whilst waiting for responses to be returned.
Require a return deadline.
Several reminders may be required.
Assumes no literacy problems.
No control over who completes it.
Not possible to give assistance if required.
Problems with incomplete questionnaires.
Replies not spontaneous and independent of each other.
Respondent can read all questions beforehand and then decide whether to complete or
not. For example, perhaps because it is too long, too complex, uninteresting, or too personal
Interviews
Personal interview
Advantages:
Case-studies
The term case-study usually refers to a fairly intensive examination of a single unit such as a
person, a small group of people, or a single company. Case-studies involve measuring what is
there and how it got there. In this sense, it is historical. It can enable the researcher to explore,
unravel and understand problems, issues and relationships. It cannot, however, allow the
researcher to generalize, that is, to argue that from one case-study the results, findings or theory
developed apply to other similar case-studies. The case looked at may be unique and, therefore
not representative of other instances. It is, of course, possible to look at several case-studies to
represent certain features of management that we are interested in studying. The case-study
approach is often done to make practical improvements. Contributions to general knowledge are
incidental.
3. Test hypotheses. The background information collected will have been analysed for possible
hypotheses. In this step, specific evidence about each hypothesis can be gathered. This step aims
to eliminate possibilities which conflict with the evidence collected and to gain confidence for
the important hypotheses. The culmination of this step might be the development of an
experimental design to test out more rigorously the hypotheses developed, or it might be to take
action to remedy the problem.
4. Take remedial action. The aim is to check that the hypotheses tested actually work out in
practice. Some action, correction or improvement is made and a re-check carried out on the
situation to see what effect the change has brought about.
The case-study enables rich information to be gathered from which potentially useful hypotheses
can be generated. It can be a time-consuming process. It is also inefficient in researching
situations which are already well structured and where the important variables have been
identified. They lack utility when attempting to reach rigorous conclusions or determining
precise relationships between variables.
Diaries
A diary is a way of gathering information about the way individuals spend their time on
professional activities. They are not about records of engagements or personal journals of
thought! Diaries can record either quantitative or qualitative data, and in management research
can provide information about work patterns and activities.
Advantages:
Disadvantages:
Subjects need to be clear about what they are being asked to do, why and what you plan
to do with the data.
Diarists need to be of a certain educational level.
Some structure is necessary to give the diarist focus, for example, a list of headings.
Encouragement and reassurance are needed as completing a diary is time-consuming and
can be irritating after a while.
Progress needs checking from time-to-time.
Confidentiality is required as content may be critical.
Analyses problems, so you need to consider how responses will be coded before the
subjects start filling in diaries.
Portfolios
A measure of a manager’s ability may be expressed in terms of the number and duration of
‘issues’ or problems being tackled at any one time. The compilation of problem portfolios is
recording information about how each problem arose, methods used to solve it, difficulties
encountered, etc. This analysis also raises questions about the person’s use of time. What
proportion of time is occupied in checking; in handling problems given by others; on self-
generated problems; on ‘top-priority’ problems; on minor issues, etc? The main problem with
this method and the use of diaries is getting people to agree to record everything in sufficient
detail for you to analyze. It is very time-consuming!
2.3 Sampling
Collecting data is time consuming and expensive, even for relatively small amounts of data.
Hence, it is highly unlikely that a complete population will be investigated. Because of the time
and cost elements the amount of data you collect will be limited and the number of people or
organizations you contact will be small in number. You will, therefore, have to take a sample and
usually a small sample.
Sampling theory says a correctly taken sample of an appropriate size will yield results that can be
applied to the population as a whole. There is a lot in this statement but the two fundamental
questions to ensure generalization are:
The answer to the second question is ‘as large as possible given the circumstances. It is like
answering the question ‘How long is a piece of string’? It all depends on the circumstances.
Whilst we do not expect you to normally generalize your results and take a large sample, we do
expect that you follow a recognized sampling procedure, such that, if the sample was increased
generalization would be possible. You therefore need to know some of the basics of sampling.
This will be done by reference to the following example.
The theory of sampling is based on random samples – where all items in the population have the
same chance of being selected as sample units. Random samples can be drawn in a number of
ways but are usually based on having some information about population members. This
information is usually in the form of an alphabetical list – called the sampling frame. Three types
of random sample can be drawn – a simple random sample (SRS), a stratified sample and a
systematic sample.
Simple random sampling can be carried out in two ways – the lottery method and using random
numbers.
transferring each person’s name from the list and putting it on a piece of paper
the pieces of paper are placed in a container and thoroughly mixed
the required number are selected by someone without looking
the names selected are the simple random sample.
This is basically similar to a game of bingo or the national lottery. This procedure is easy to carry
out especially if both population and sample are small, but can be tedious and time consuming
for large populations or large samples.
Alternatively random numbers can be used. Random numbers are strings of digits that have been
generated by the lottery method and can be found in books of statistical tables. An example of
these is:
03 47 43 73 86 36 96 47 36 61
97 74 24 67 62 42 81 14 57 20
16 76 62 27 66 56 50 26 71 07
12 56 85 99 26 96 96 68 27 31
55 59 56 35 64 38 04 80 46 22
Random numbers tend to be written in pairs and blocks of 5 by 5 to make reading easy.
However, care is needed when reading these tables. The numbers can be read in any direction but
they should be read as a single string of digits i.e., left to right as 0, 3, 4, 7 etc.’, or top to bottom
as 0, 9, 1, 1, 5, 3, 7, … etc. It is usual to read left to right.
Allocating a number to each person on the list (each number must consist of the same
number of digits so that the tables can be read consistently).
Find a starting point at random in the tables (close your eyes and point).
Read off the digits.
The names matching the numbers are the sample units.
a) The sampling frame is the list of 90 people. Number this list 00, 01, 02, …, 89. Note that each
number has two digits and the numbering starts from 00.
b) Suppose a starting point is found at random from the random number tables and let this
number be 16. Then the person that has been numbered 16 is the first sample unit.
c) Let the next two digits be 76, then the person numbered 76 is the second sample unit.
This procedure is repeated until the nine people have been identified.
d) Any number occurring for second time is ignored as is any two-digit number over 89.
Simple random number sampling is used as the basis for many other sampling methods, but has
two disadvantages:
To overcome the second problem above, a stratified sample can be taken. In this the population
structure is reflected in the sample structure, with respect to some criterion.
For example, suppose the 90 people consist of 30 men and 60 women. If gender is the criterion
for stratification, then:
The three men and six women would then be selected by simple random sampling e.g., random
numbers.
The problem with this approach is the criterion for stratification, (e.g., age, sex, job description),
is chosen by you – it is subjective and may not be the best or more appropriate criterion. Also a
more detailed sampling frame is required.
1 to 9
10 to 19
Etc.
80 to 89
Then the 16th, 26th, 36th, 46th, 56th, 66th, 76th, and 86th people are the remaining sample units.
This approach usually generates a good cross section of the population. However, you may need
a team of people when no sampling frame exists to help with counting, interviewing, etc.
ASSIGNMENT 2
2. Discuss the various methods of data collection. Indicate the situations in which each of these
methods should be used.
3. What is sampling?
4. State four reasons why is it important to study a sample instead of the whole population
3.1 Introduction
Graphs and diagram leave a lasting impression on the mind and make intelligible and easily
understandable the salient features of the data. Forecasting also becomes easier with the help of
graph. Thus, it is of interest to study the graphical representation of data.
3.2 General Principles of Constructing Diagrams
2. Each diagram must be given a clear, concise and suitable title without damaging clarity.
3. A proper proportion between height and width must be maintained in order to avoid an
unpleasant look.
4. Select a proper scale; it should be in even numbers or in multiples of five or ten. e.g. 25, 50,
75 or 10, 20, 30, 40… etc. But no fixed rule.
"The important point that must be borne in mind at all times that the pictorial representation
chosen for any situation must depict the true relationship and point out the proper conclusion.
Above all the chart must be honest.” .... C. W. LOWE.
It represents only one variable. For example, sales, production, population figures etc. for various
years may be shown by simple bar charts. Since these are of the same width and vary only in
heights (or lengths), it becomes very easy for readers to study the relationship. Simple bar
diagrams are very popular in practice. A bar chart can be either vertical or horizontal; vertical
bars are more popular.
Illustration: - The following table gives the birth rate per thousand of different countries over a
certain period of time.
India 33
China 40 40
Germany 15 15
New Zealand 30
U. K. 20
Sweden 15
TASK 1
Represent the above data by a suitable diagram and make conclusions about the birth rates.
While constructing such a diagram, the various components in each bar should be kept in the
same order. A common and helpful arrangement is that of presenting each bar in the order of
magnitude with the largest component at the bottom and the smallest at the top. The components
are shown with different shades or colors with a proper index.
Illustration: - During 1968 - 71, the number of students in University ' X ' are as follows.
TASK 2
This method can be used for data which is made up of two or more components. In this method
the components are shown as separate adjoining bars. The height of each bar represents the
actual value of the component. The components are shown by different shades or colors. Where
changes in actual values of component figures only are required, multiple bar charts are used.
Illustration: - The table below gives data relating to the exports and imports of a certain country
X (in thousands of dollars) during the four years ending in 1930 - 31.
TASK 3
Deviation bars are used to represent net quantities - excess or deficit i.e., net profit, net loss, net
exports or imports, swings in voting etc. Such bars have both positive and negative values.
Positive values lie above the base line and negative values lie below it.
Illustration: -
TASK 4
i) Geometrically it can be seen that the area of a sector of a circle taken radially, is proportional
to the angle at its center. It is therefore sufficient to draw angles at the center, proportional to the
original figures. This will make the areas of the sector proportional to the basic figures.
For example, let the total be 1000 and one of the components be 200, then the angle will be
ii) When a statistical phenomenon is composed of different components which are numerous
(Say four or more components), bar charts are not suitable to represent them because, under this
situation, they become very complex and their visual impressions are questioned. A pie diagram
is suitable for such situations. It is a circular diagram which is a circle (pie) divided by the radii,
into sectors (like slices of a cake or pie). The area of a sector is proportional to the size of each
component.
Pie charts are useful to compare different parts of a whole amount. They are often used to present
financial information. E.g., A Company’s expenditure can be shown to be the sum of its parts
including different expense categories such as salaries, borrowing interest, taxation and general
running costs (i.e., rent, electricity, heating etc.).
A pie chart is a circular chart in which the circle is divided into sectors. Each sector visually
represents an item in a data set to match the amount of the item as a percentage or fraction of the
total data set.
Illustration: A family's weekly expenditure on its house mortgage, food and fuel is as follows:
expenditure Ksh
Mortgage 300
Food 225
Fuel 75
TASK 5
3.5 Graphs
A graph is a visual representation of data by a continuous curve on a squared (graph) paper. Like
diagrams, graphs are also attractive, and eye-catching, giving a bird's eye-view of data and
revealing their inner pattern.
1. Histogram
2. Frequency Polygon
3. Frequency Curve
3.5.1 Histogram
To construct a Histogram, the class intervals are plotted along the x-axis and corresponding
frequencies are plotted along the y - axis. The rectangles are constructed such that the height of
each rectangle is proportional to the frequency of the class and width is equal to the length of the
class. If all the classes have equal width, then all the rectangles stand on the equal width. In case
of classes having unequal widths, rectangles too stand on unequal widths (bases). For open-
classes, Histogram is constructed after making certain assumptions. As the rectangles are
adjacent leaving no gaps, the class-intervals become of the inclusive type, adjustment is
necessary for end points only.
For example, in a book sale, you want to determine which books were most popular, the high-
priced books, the low-priced books, books most neglected etc. Let us say you sold total 31 books
at this book-fair at the following prices.
11, $ 12, $ 12, $ 12, $ 14, $ 16, $ 18, $ 20, $ 24, $ 21, $ 22, $ 25.
The books are ranging from $1 to $25. Divide this range into number of groups, class intervals.
Typically, there should not be fewer than 5 and more than 20 class-intervals are best for a
frequency Histogram.
Our first class-interval includes the lowest price of the data and, the last-interval of course
includes, the highest price. Also make sure that overlapping is avoided, so that, no one price falls
into two class-intervals. For example, you have class intervals as 0-5, 5-10, 10-15 and so on, then
the price $10 falls in both 5-10 and 10-15. Instead, if we use $1 - $5, $6=$10, the class-intervals
will be mutually exclusive.
Note that each class-interval is of equal width i.e., $5 inclusive. Now we draw the frequency
Histogram as below.
.
Frequency distribution curves are like frequency polygons. In frequency distribution, instead of
using straight line segments, a smooth curve is used to connect the points. The frequency curve
for the above data is shown as:
When frequencies are added, they are called cumulative frequencies. The curve obtained by
plotting cumulating frequencies is called a cumulative frequency curve or an ogive (pronounced
ojive).
To construct an Ogive: -
1) Add up the progressive totals of frequencies, class by class, to get the cumulative frequencies.
2) Plot classes on the horizontal (x-axis) and cumulative frequencies on the vertical (y-axis).
Note that Ogives start at (i) zero on the vertical axis, and (ii) outside class limit of the last class.
In most of the cases it looks like 'S'.
Note that cumulative frequencies are plotted against the 'limits' of the classes to which they refer.
(A) Less than Ogive: - To plot a less than ogive, the data is arranged in ascending order of
magnitude and the frequencies are cumulated starting from the top. It starts from zero on the y-
axis and the lower limit of the lowest class interval on the x-axis.
(B) Greater than Ogive: - To plot this ogive, the data are arranged in the ascending order of
magnitude and frequencies are cumulated from the bottom. This curve ends at zero on the the y-
axis and the upper limit of the highest-class interval on the x-axis.
Illustrations: - On a graph paper, draw the two ogives for the data given below of the I.Q. of 160
students.
Class- 60-70 70-80 80-90 90- 100- 110- 120- 130- 140- 150-
interval 100 110 120 130 140 150 160
No. of 2 7 12 28 42 36 18 10 4 1
students
Uses of Ogive
Certain values like median, quartiles, deciles, quartile deviation, coefficient of skewness etc. can
be located using Ogives. It can be used to find the percentage of items having values less than
certain amount.
A stem and leaf diagram provides a visual summary of your data. This diagram provides a partial
sorting of the data and allows you to detect the distributional pattern of the data.
There are three steps for drawing a stem and leaf diagram.
1. Split the data into two pieces, stem and leaf.
154, 143, 148, 139, 143, 147, 153, 162, 136, 147, 144, 143, 139, 142, 143, 156, 151, 164, 157,
149, and 146
What we have here is almost a stem and leaf diagram. Note that with the data written in this way
you can see what the modal class is (the one with the most values. You can also see the shape of
the distribution- most of the values are in the 140s with higher or lower values rarer.
To change this into a stem and leaf diagram, we just simplify it a little. Instead of writing out the
full figures each time (143, 143, 144, 143, ...) we write '14' and call this the 'stem' and then write
3, 3, 4, 3, ... (these being the 'leaves'). We would usually, however, write the leaves in order
(with the smallest first). Finally, we must also include a little key so that people know how to
interpret the diagram.
(c) Find, correct to the nearest whole number, the mean number of people in a family.
2.A marine biologist records as a frequency distribution the lengths (L), measured to the nearest
centimeter, of 100 mackerel. The results are given in the table below.
27 < L≤29 2
29 < L ≤ 31 4
31 < L ≤ 33 8
31 < L ≤ 33 21
33 < L ≤ 35 30
37 < L ≤ 39 18
39 < L ≤ 41 12
41 < L ≤ 43 5
100
(a) Construct a cumulative frequency table for the data in the table.
180, 184, 195, 177, 175, 173, 169, 167, 197, 173, 166, 183, 161, 195, 177, 192, 161, 165
5. The following stem and leaf diagram gives the heights in cm of 39 schoolchildren.
Stem Leaf
13 2, 3, 3, 5, 8
14 1, 1, 1,4, 5, 5, 9
15 3, 4, 4, 6, 6, 7, 7, 7, 8, 9, 9
16 1, 2, 2, 5, 6, 6, 7, 8, 8
17 4, 4, 4, 5, 6, 6
18 0
KEY
13 2, represents 132cm