0% found this document useful (0 votes)
8 views28 pages

Quantitative Tech- intro

Uploaded by

tum chris
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views28 pages

Quantitative Tech- intro

Uploaded by

tum chris
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 28

CHAPTER 1: INTRODUCTION

1.1 What is Statistics?

The word 'statistics' is defined by Croxton and Cowden as follows: -

"The collection, presentation, analysis and interpretation of the numerical data." This definition
clearly points out four stages in a statistical investigation, namely:

1) Collection of data 2) Presentation of data 3) Analysis of data 4) Interpretation of data

In addition to this, one more stage i.e., organization of data is suggested

1.2 Uses of Statistics

a) To present the data in a concise and definite form: Statistics helps in classifying and tabulating
raw data for processing and further tabulation for end users.

b) To make it easy to understand complex and large data: This is done by presenting the data in
the form of tables, graphs, diagrams etc., or by condensing the data with the help of means,
dispersion etc.

c) For comparison: Tables, measures of means and dispersion can help in comparing different
sets of data.

d) In forming policies: It helps in forming policies like a production schedule, based on the
relevant sales figures. It is used in forecasting future demands.

e) Enlarging individual experiences: Complex problems can be well understood by statistics, as


the conclusions drawn by an individual are more definite and precise than mere statements on
facts.

f) In measuring the magnitude of a phenomenon: - Statistics has made it possible to count the
population of a country, the industrial growth, the agricultural growth, the educational level (of
course in numbers)

1.3 Limitations of Statistics

1. Statistics does not deal with individual measurements. Since statistics deals with aggregates of
facts, it cannot be used to study the changes that have taken place in individual cases. For
example, the wages earned by a single industry worker at any time, taken by itself is not a
statistical datum. But the wages of workers of that industry can be used statistically. Similarly,
the marks obtained by Kamau of your class or the height of Atieno (also of your class) are not
the subject matter of statistical study. But the average marks or the average height of your class
has statistical relevance.
2. Statistics cannot be used to study qualitative phenomenon like morality, intelligence, beauty
etc. as these cannot be quantified. However, it may be possible to analyze such problems
statistically by expressing them numerically. For example, we may study the intelligence of boys
on the basis of the marks obtained by them in an examination.

3. Statistical results are true only on an average: - The conclusions obtained statistically are not
universal truths. They are true only under certain conditions. This is because statistics as a
science is less exact as compared to the natural science.

4. Statistical data, being approximations, are mathematically incorrect. Therefore, they can be
used only if mathematical accuracy is not needed.

5. Statistics, being dependent on figures, can be manipulated and therefore can be used only
when the authenticity of the figures has been proved beyond doubt.

1.4 Distrust of Statistics

It is often said by people that, "statistics can prove anything." There are three types of lies - lies,

dump lies and statistics - wicked in the order of their naming. A Paris banker said, "Statistics is

like a miniskirt, it covers up essentials but gives you the ideas."

Thus by "distrust of statistics" we mean lack of confidence in statistical statements and methods.

The following reasons account for such views about statistics.

1. Figures are convincing and, therefore people easily believe them.

2. They can be manipulated in such a manner as to establish foregone conclusions.

3. The wrong representation of even correct figures can mislead a reader. For example, John
earned Ksh 400,000 in 1990 - 1991 and Jane earned Ksh 500,000. Reading this one would form
the opinion that Jane is decidedly a better worker than John. However, if we carefully examine
the statement, we might reach a different conclusion as Jane’s earning period is unknown to us.
Thus, while working with statistics, one should not only avoid outright falsehoods but be alert to
detect possible distortion of the truth.

1.5 Types of Statistics

Broadly speaking, statistics may be divided into two categories, ie descriptive and inferential
statistics.

When analyzing data, for example, the marks achieved by 100 students for a piece of
coursework, it is possible to use both descriptive and inferential statistics in your analysis of their
marks. Typically, in most research conducted on groups of people; you will use both descriptive
and inferential statistics to analyze the results and draw conclusions. So, what are descriptive and
inferential statistics? And what are their differences?

1.5.1 Descriptive Statistics

Descriptive statistics is the term given to the analysis of data that helps describe, show or
summarize data in a meaningful way such that, for example, patterns might emerge from the
data. Descriptive statistics do not, however, allow us to make conclusions beyond the data we
have analyzed or reach conclusions regarding any hypotheses we might have made. They are
simply a way to describe our data.

Descriptive statistics are very important, because if we simply presented our raw data, it would
be hard to visualize what the data was showing, especially if there was a lot of it. Descriptive
statistics therefore, allow us to present the data in a more meaningful way which allows simpler
interpretation of the data. For example, if we had the results of 100 pieces of students'
coursework, we may be interested in the overall performance of those students. We would also
be interested in the distribution or spread of the marks. Descriptive statistics allow us to do this.

Typically, there are two general types of statistic that are used to describe data:

Measures of central tendency: these are ways of describing the central position of a frequency
distribution for a group of data. In this case, the frequency distribution is simply the distribution
and pattern of marks scored by the 100 students from the lowest to the highest. We can describe
this central position using a number of statistics, including the mode, median, and mean.

Measures of spread: these are ways of summarizing a group of data by describing how spread
out the scores are. For example, the mean score of our 100 students may be 65 out of 100.
However, not all students will have scored 65 marks. Rather, their scores will be spread out.
Some will be lower and others higher. Measures of spread help us to summarize how spread out
these scores are. To describe this spread, a number of statistics are available to us, including the
range, quartiles, absolute deviation, variance and standard deviation.

When we use descriptive statistics, it is useful to summarize our group of data using a
combination of tabulated description (i.e., tables), graphical description (i.e., graphs and charts)
and statistical commentary (i.e., a discussion of the results).

1.5.2 Inferential Statistics

Whilst descriptive statistics examine our immediate group of data (for example, the 100 students'
marks), inferential statistics aim to make inferences from this data in order to make conclusions
that go beyond this data. In other words, inferential statistics are used to make inferences about a
population from a sample in order to generalize (make assumptions about this wider population)
and/or make predictions about the future.
For example, a Board of Examiners may want to compare the performance of 1000 students that
completed an examination. Of these, 500 students are girls and 500 students are boys. The 1000
students represent our "population". Whilst we are interested in the performance of all 1000
students, girls and boys, it may be impractical to examine the marks of all of these students
because of the time and cost required to collate all of their marks. Instead, we can choose to
examine a "sample" of these students and then use the results to make generalizations about the
performance of all 1000 students. For the purpose of our example, we may choose a sample size
of 200 students. Since we are looking to compare boys and girls, we may randomly select 100
girls and 100 boys in our sample. We could then use this, for example, to see if there are any
statistically significant differences in the mean mark between boys and girls, even though we
have not measured all 1000 students.

1.6 Common Mistakes Committed in Interpretation of Statistics

1. Bias: - Bias means prejudice or preference of the investigator, which creeps in consciously and
unconsciously in proving a particular point.

2. Generalization: - Some times on the basis of little data available one could jump to a
conclusion, which leads to erroneous results.

3. Wrong conclusion: - The characteristics of a group if attached to an individual member of that


group, may lead us to draw absurd conclusions.

4. Incomplete classification: - If we fail to give a complete classification, the influence of various


factors may not be properly understood.

5. There may be a wrong use of percentages.

6. Technical mistakes may also occur.

7. An inconsistency in definition can even exist.

8. Wrong causal inferences may sometimes be drawn.

1.7 Types of Variables

1.7.1 Discrete Variable

A discrete variable is one that cannot take on all values within the limits of the variable. For
example, responses to a five-point rating scale can only take on the values 1, 2, 3, 4, and 5. The
variable cannot have the value 1.7. A variable such as a person's height can take on any value.
Variables that can take on any value and therefore are not discrete are called continuous.
Statistics from computed discrete variables have many more possible values than the discrete
variables themselves. The mean on a five-point scale could be 3.117 even though 3.117 is not
possible for an individual score.
1.7.2 Continuous Variable

A continuous variable is one for which, within the limits the variable ranges, any value is
possible. For example, the variable "Time to solve an anagram problem" is continuous since it
could take 2 minutes, 2.13 minutes etc. to finish a problem. The variable "Number of correct
answers on a 100 point multiple-choice test" is not a continuous variable since it is not possible
to get 54.12 problems correct. A variable that is not continuous is called "discrete"

1.8 Scales of measurement

1.8.1 Nominal Scale

Nominal measurement consists of assigning items to groups or categories. No quantitative


information is conveyed and no ordering of the items is implied. Nominal scales are therefore
qualitative rather than quantitative. Religious preference, race, and sex are all examples of
nominal scales. Frequency distributions are usually used to analyze data measured on a nominal
scale. The main statistic computed is the mode. Variables measured on a nominal scale are often
referred to as categorical or qualitative variables.

1.8.2 Ordinal Scale

Measurements with ordinal scales are ordered in the sense that higher numbers represent higher
values. However, the intervals between the numbers are not necessarily equal. For example, on a
five-point rating scale measuring attitudes toward gun control, the difference between a rating of
2 and a rating of 3 may not represent the same difference as the difference between a rating of 4
and a rating of 5. There is no "true" zero point for ordinal scales since the zero point is chosen
arbitrarily. The lowest point on the rating scale in the example was arbitrarily chosen to be 1. It
could just as well have been 0 or -5.

1.8.3 Interval Scale

On interval measurement scales, one unit on the scale represents the same magnitude on the trait
or characteristic being measured across the whole range of the scale. For example, if anxiety
were measured on an interval scale, then a difference between a score of 10 and a score of 11
would represent the same difference in anxiety as would a difference between a score of 50 and a
score of 51. Interval scales do not have a "true" zero point, however, and therefore it is not
possible to make statements about how many times higher one score is than another. For the
anxiety scale, it would not be valid to say that a person with a score of 30 was twice as anxious
as a person with a score of 15. True interval measurement is somewhere between rare and
nonexistent in the behavioral sciences. No interval-level scale of anxiety such as the one
described in the example actually exists. A good example of an interval scale is the Fahrenheit
scale for temperature. Equal differences on this scale represent equal differences in temperature,
but a temperature of 30 degrees is not twice as warm as one of 15 degrees.
1.8.4 Ratio Scale

Ratio scales are like interval scales except they have true zero points. A good example is the
Kelvin scale of temperature. This scale has an absolute zero. Thus, a temperature of 300 Kelvin
is twice as high as a temperature of 150 Kelvin.

ASSIGNMENT 1

1. Define statistics

2. Explain how the knowledge of statistics may be applied in business situation.

4. State two ways in which statistics may be misused.

5. Distinguish between descriptive and inferential statistics.

6. State the scale of measurement the following can be classified into;

i. The mass of a bull.

ii. The length of time spent in a restaurant.

iii. The rank of an army officer.

iv. The type of vehicle driven by a celebrity.


CHAPTER 2: COLLECTION OF DATA

For any statistical enquiry, the basic objective is to collect facts and figures relating to a
particular phenomenon for further statistical analysis. The process of counting, enumeration or
measurement together with systematic recording of results is called collection of statistical data.

2.1 Primary and Secondary Data

Primary data is data that you collect yourself using such methods as:

direct observation - lets you focus on details of importance to you; lets you see a system in real
rather than theoretical use (other faults are unlikely or trivial in theory but quite real and
annoying in practice);

surveys - written surveys let you collect considerable quantities of detailed data. You have to
either trust the honesty of the people surveyed or build in self-verifying questions (e.g., questions
9 and 24 ask basically the same thing but using different words - different answers may indicate
the surveyed person is being inconsistent, dishonest or inattentive).

interviews - slow, expensive, and they take people away from their regular jobs, but they allow
in-depth questioning and follow-up questions. They also show non-verbal communication such
as face-pulling, fidgeting, shrugging, hand gestures, sarcastic expressions that add further
meaning to spoken words. e.g., "I think it's a GREAT system" could mean vastly different things
depending on whether the person was sneering at the time! A problem with interviews is that
people might say what they think the interviewer wants to hear; they might avoid being honestly
critical in case their jobs or reputation might suffer.

logs (e.g., fault logs, error logs, complaint logs, transaction logs). Good, empirical, objective
data sources (usually, if they are used well). Can yield lots of valuable data about system
performance over time under different conditions.

Primary data can be relied on because you know where it came from and what was done to it. It's
like cooking something yourself. You know what went into it.

Secondary data is collected from external sources such as:

 TV, radio, internet


 magazines, newspapers
 reviews
 research articles
 stories told by people you know

There's a lot more secondary data than primary data, and secondary data is a whole lot cheaper
and easier to acquire than primary data. The problem is that often the reliability, accuracy and
integrity of the data is uncertain. Who collected it? Can they be trusted? Did they do any
preprocessing of the data? Is it biased? How old is it? Where was it collected? Can the data be
verified, or does it have to be taken on faith?

Often secondary data has been pre-processed to give totals or averages and the original details
are lost so you can't verify it by replicating the methods used by the original data collectors.

In short, primary data is expensive and difficult to acquire, but it's trustworthy. Secondary data is
cheap and easy to collect, but must be treated with caution.

2.2 Methods of collecting Primary data

In primary data collection, you collect the data yourself using methods such as interviews and
questionnaires. The key point here is that the data you collect is unique to you and your research
and, until you publish, no one else has access to it.

There are many methods of collecting primary data and the main methods include:

 questionnaires
 interviews
 focus group interviews
 observation
 case-studies
 diaries
 critical incidents
 Portfolios.

The primary data, which is generated by the above methods, may be qualitative in nature
(usually in the form of words) or quantitative (usually in the form of numbers or where you can
make counts of words used). We briefly outline these methods but you should also read around
the various methods.

Questionnaires

Questionnaires are a popular means of collecting data, but are difficult to design and often
require many rewrites before an acceptable questionnaire is produced.

Advantages:

 Can be used as a method in its own right or as a basis for interviewing or a telephone
Survey.
 Can be posted, e-mailed or faxed.
 Can cover a large number of people or organizations.
 Wide geographic coverage.
 Relatively cheap.
 No prior arrangements are needed.
 Avoids embarrassment on the part of the respondent.
 Respondent can consider responses.
 Possible anonymity of respondent.
 No interviewer bias.

Disadvantages:

 Design problems.
 Questions have to be relatively simple.
 Historically low response rate (although inducements may help).
 Time delay whilst waiting for responses to be returned.
 Require a return deadline.
 Several reminders may be required.
 Assumes no literacy problems.
 No control over who completes it.
 Not possible to give assistance if required.
 Problems with incomplete questionnaires.
 Replies not spontaneous and independent of each other.
 Respondent can read all questions beforehand and then decide whether to complete or
not. For example, perhaps because it is too long, too complex, uninteresting, or too personal

Interviews

Interviewing is a technique that is primarily used to gain an understanding of the underlying


reasons and motivations for people’s attitudes, preferences or behavior. Interviews can be
undertaken on a personal one-to-one basis or in a group. They can be conducted at work, at
home, in the street or in a shopping center, or some other agreed location.

Personal interview

Advantages:

 Serious approach by respondent resulting in accurate information.


 Good response rate.
 Completed and immediate.
 Possible in-depth questions.
 Interviewer in control and can give help if there is a problem.
 Can investigate motives and feelings.
 Can use recording equipment.
 Characteristics of respondent assessed – tone of voice, facial expression, hesitation, etc.
 Can use props.
 If one interviewer used, uniformity of approach.
 Used to pilot other methods.
Disadvantages:

 Need to set up interviews.


 Time consuming.
 Geographic limitations.
 Can be expensive.
 Normally need a set of questions.
 Respondent bias – tendency to please or impress, create false personal image, or end
interview quickly.
 Embarrassment possible if personal questions.
 Transcription and analysis can present problems – subjectivity.
 If many interviewers, training required.

Case-studies

The term case-study usually refers to a fairly intensive examination of a single unit such as a
person, a small group of people, or a single company. Case-studies involve measuring what is
there and how it got there. In this sense, it is historical. It can enable the researcher to explore,
unravel and understand problems, issues and relationships. It cannot, however, allow the
researcher to generalize, that is, to argue that from one case-study the results, findings or theory
developed apply to other similar case-studies. The case looked at may be unique and, therefore
not representative of other instances. It is, of course, possible to look at several case-studies to
represent certain features of management that we are interested in studying. The case-study
approach is often done to make practical improvements. Contributions to general knowledge are
incidental.

The case-study method has four steps:

1. Determine the present situation.

2. Gather background information about the past and key variables.

3. Test hypotheses. The background information collected will have been analysed for possible
hypotheses. In this step, specific evidence about each hypothesis can be gathered. This step aims
to eliminate possibilities which conflict with the evidence collected and to gain confidence for
the important hypotheses. The culmination of this step might be the development of an
experimental design to test out more rigorously the hypotheses developed, or it might be to take
action to remedy the problem.

4. Take remedial action. The aim is to check that the hypotheses tested actually work out in
practice. Some action, correction or improvement is made and a re-check carried out on the
situation to see what effect the change has brought about.
The case-study enables rich information to be gathered from which potentially useful hypotheses
can be generated. It can be a time-consuming process. It is also inefficient in researching
situations which are already well structured and where the important variables have been
identified. They lack utility when attempting to reach rigorous conclusions or determining
precise relationships between variables.

Diaries

A diary is a way of gathering information about the way individuals spend their time on
professional activities. They are not about records of engagements or personal journals of
thought! Diaries can record either quantitative or qualitative data, and in management research
can provide information about work patterns and activities.

Advantages:

 Useful for collecting information from employees.


 Different writers compared and contrasted simultaneously.
 Allows the researcher freedom to move from one organization to another.
 Researcher not personally involved.
 Diaries can be used as a preliminary or basis for intensive interviewing.
 Used as an alternative to direct observation or where resources are limited.

Disadvantages:

 Subjects need to be clear about what they are being asked to do, why and what you plan
to do with the data.
 Diarists need to be of a certain educational level.
 Some structure is necessary to give the diarist focus, for example, a list of headings.
 Encouragement and reassurance are needed as completing a diary is time-consuming and
can be irritating after a while.
 Progress needs checking from time-to-time.
 Confidentiality is required as content may be critical.
 Analyses problems, so you need to consider how responses will be coded before the
subjects start filling in diaries.

Portfolios

A measure of a manager’s ability may be expressed in terms of the number and duration of
‘issues’ or problems being tackled at any one time. The compilation of problem portfolios is
recording information about how each problem arose, methods used to solve it, difficulties
encountered, etc. This analysis also raises questions about the person’s use of time. What
proportion of time is occupied in checking; in handling problems given by others; on self-
generated problems; on ‘top-priority’ problems; on minor issues, etc? The main problem with
this method and the use of diaries is getting people to agree to record everything in sufficient
detail for you to analyze. It is very time-consuming!

2.3 Sampling

Collecting data is time consuming and expensive, even for relatively small amounts of data.
Hence, it is highly unlikely that a complete population will be investigated. Because of the time
and cost elements the amount of data you collect will be limited and the number of people or
organizations you contact will be small in number. You will, therefore, have to take a sample and
usually a small sample.

Sampling theory says a correctly taken sample of an appropriate size will yield results that can be
applied to the population as a whole. There is a lot in this statement but the two fundamental
questions to ensure generalization are:

1. How is a sample taken correctly?

2. How big should the sample be?

The answer to the second question is ‘as large as possible given the circumstances. It is like
answering the question ‘How long is a piece of string’? It all depends on the circumstances.

Whilst we do not expect you to normally generalize your results and take a large sample, we do
expect that you follow a recognized sampling procedure, such that, if the sample was increased
generalization would be possible. You therefore need to know some of the basics of sampling.
This will be done by reference to the following example.

The theory of sampling is based on random samples – where all items in the population have the
same chance of being selected as sample units. Random samples can be drawn in a number of
ways but are usually based on having some information about population members. This
information is usually in the form of an alphabetical list – called the sampling frame. Three types
of random sample can be drawn – a simple random sample (SRS), a stratified sample and a
systematic sample.

2.3.1 Simple random sampling

Simple random sampling can be carried out in two ways – the lottery method and using random
numbers.

The lottery method involves:

 transferring each person’s name from the list and putting it on a piece of paper
 the pieces of paper are placed in a container and thoroughly mixed
 the required number are selected by someone without looking
 the names selected are the simple random sample.
This is basically similar to a game of bingo or the national lottery. This procedure is easy to carry
out especially if both population and sample are small, but can be tedious and time consuming
for large populations or large samples.

Alternatively random numbers can be used. Random numbers are strings of digits that have been
generated by the lottery method and can be found in books of statistical tables. An example of
these is:

03 47 43 73 86 36 96 47 36 61

97 74 24 67 62 42 81 14 57 20

16 76 62 27 66 56 50 26 71 07

12 56 85 99 26 96 96 68 27 31

55 59 56 35 64 38 04 80 46 22

Random numbers tend to be written in pairs and blocks of 5 by 5 to make reading easy.
However, care is needed when reading these tables. The numbers can be read in any direction but
they should be read as a single string of digits i.e., left to right as 0, 3, 4, 7 etc.’, or top to bottom
as 0, 9, 1, 1, 5, 3, 7, … etc. It is usual to read left to right.

The random number method involves:

 Allocating a number to each person on the list (each number must consist of the same
number of digits so that the tables can be read consistently).
 Find a starting point at random in the tables (close your eyes and point).
 Read off the digits.
 The names matching the numbers are the sample units.

For the example of selecting nine people at random from 90:

a) The sampling frame is the list of 90 people. Number this list 00, 01, 02, …, 89. Note that each
number has two digits and the numbering starts from 00.

b) Suppose a starting point is found at random from the random number tables and let this
number be 16. Then the person that has been numbered 16 is the first sample unit.

c) Let the next two digits be 76, then the person numbered 76 is the second sample unit.

This procedure is repeated until the nine people have been identified.

d) Any number occurring for second time is ignored as is any two-digit number over 89.

Simple random number sampling is used as the basis for many other sampling methods, but has
two disadvantages:

 A sampling frame is required. This may not be available, exist or be incomplete.


 The procedure is unbiased but the sample may be biased. For instance, if the 90 people
are a mixture of men and women and all men were selected this would be a biased sample.

2.3.2 Stratified Sampling

To overcome the second problem above, a stratified sample can be taken. In this the population
structure is reflected in the sample structure, with respect to some criterion.

For example, suppose the 90 people consist of 30 men and 60 women. If gender is the criterion
for stratification, then:

Thus, the sample reflects the population structure in terms of gender.

The three men and six women would then be selected by simple random sampling e.g., random
numbers.

The problem with this approach is the criterion for stratification, (e.g., age, sex, job description),
is chosen by you – it is subjective and may not be the best or more appropriate criterion. Also a
more detailed sampling frame is required.

2.3.3 Systematic sampling


Whilst not truly random this is a method that is used extensively because it is easy to operate and
quick, even when the population and the sample are large. For example, for the population 90
and sample of nine:

Split the sampling frame in to nine equal groups. i.e.

1 to 9

10 to 19

Etc.

80 to 89

Select a number between 1 and 9 using random number tables.

Suppose this number is 6.

Person numbered 6 is chosen.

Then the 16th, 26th, 36th, 46th, 56th, 66th, 76th, and 86th people are the remaining sample units.

If no sampling frame is available access to the population is necessary, such as customers of a


business such as a leisure center, restaurant or museum.

Systematic sampling can be used by selecting a random number say 25.

Then the 25th person to enter is the first sample unit.

The 50th person to enter is the second sample unit.

This process is carried on until the required sample size is met.

This approach usually generates a good cross section of the population. However, you may need
a team of people when no sampling frame exists to help with counting, interviewing, etc.

ASSIGNMENT 2

1. Distinguish between primary and secondary data.

2. Discuss the various methods of data collection. Indicate the situations in which each of these
methods should be used.
3. What is sampling?

4. State four reasons why is it important to study a sample instead of the whole population

5. Discuss the various sampling methods.

CHAPTER 3: ORGANIZATION AND REPRESENTATION OF DATA

3.1 Introduction

Graphs and diagram leave a lasting impression on the mind and make intelligible and easily
understandable the salient features of the data. Forecasting also becomes easier with the help of
graph. Thus, it is of interest to study the graphical representation of data.
3.2 General Principles of Constructing Diagrams

1. The diagrams should be simple.

2. Each diagram must be given a clear, concise and suitable title without damaging clarity.

3. A proper proportion between height and width must be maintained in order to avoid an
unpleasant look.

4. Select a proper scale; it should be in even numbers or in multiples of five or ten. e.g. 25, 50,
75 or 10, 20, 30, 40… etc. But no fixed rule.

5. In order to clear certain points, always put footnotes.

6. An index, explaining different lines, shades and colors should be given.

7. Diagrams should be absolutely neat and clean.

"The important point that must be borne in mind at all times that the pictorial representation
chosen for any situation must depict the true relationship and point out the proper conclusion.
Above all the chart must be honest.” .... C. W. LOWE.

3.3 Bar Diagrams

3.3.1 Simple 'Bar diagram'

It represents only one variable. For example, sales, production, population figures etc. for various
years may be shown by simple bar charts. Since these are of the same width and vary only in
heights (or lengths), it becomes very easy for readers to study the relationship. Simple bar
diagrams are very popular in practice. A bar chart can be either vertical or horizontal; vertical
bars are more popular.

Illustration: - The following table gives the birth rate per thousand of different countries over a
certain period of time.

Country Birth rate Country Birth rate

India 33
China 40 40

Germany 15 15
New Zealand 30
U. K. 20
Sweden 15

TASK 1

Represent the above data by a suitable diagram and make conclusions about the birth rates.

3.3.2 Sub - divided Bar Diagram

While constructing such a diagram, the various components in each bar should be kept in the
same order. A common and helpful arrangement is that of presenting each bar in the order of
magnitude with the largest component at the bottom and the smallest at the top. The components
are shown with different shades or colors with a proper index.

Illustration: - During 1968 - 71, the number of students in University ' X ' are as follows.

Year Arts Science Law Total


1968-69 20,000 10,000 5,000 35,000
1969-70 26,000 9,000 7,000 42,000
1970-71 31,000 9,500 7,500 48,000

TASK 2

Represent the data by a suitable diagram.

3.3.3 Multiple Bar Diagram

This method can be used for data which is made up of two or more components. In this method
the components are shown as separate adjoining bars. The height of each bar represents the
actual value of the component. The components are shown by different shades or colors. Where
changes in actual values of component figures only are required, multiple bar charts are used.

Illustration: - The table below gives data relating to the exports and imports of a certain country
X (in thousands of dollars) during the four years ending in 1930 - 31.

Year Import Export


1927-28 319 250
1928-29 339 263
1929-30 345 258
1930-31 308 206

TASK 3

Represent the data by a suitable diagram


3.3.4 Deviation Bar Charts

Deviation bars are used to represent net quantities - excess or deficit i.e., net profit, net loss, net
exports or imports, swings in voting etc. Such bars have both positive and negative values.

Positive values lie above the base line and negative values lie below it.

Illustration: -

Year Sales Net profits


1985-86 10% 50%
1986-87 14% _-20%
1987-88 12% -10%

TASK 4

Present the above data by a suitable diagram

3.4 Pie Chart

i) Geometrically it can be seen that the area of a sector of a circle taken radially, is proportional
to the angle at its center. It is therefore sufficient to draw angles at the center, proportional to the
original figures. This will make the areas of the sector proportional to the basic figures.

For example, let the total be 1000 and one of the components be 200, then the angle will be

ii) When a statistical phenomenon is composed of different components which are numerous
(Say four or more components), bar charts are not suitable to represent them because, under this
situation, they become very complex and their visual impressions are questioned. A pie diagram
is suitable for such situations. It is a circular diagram which is a circle (pie) divided by the radii,
into sectors (like slices of a cake or pie). The area of a sector is proportional to the size of each
component.
Pie charts are useful to compare different parts of a whole amount. They are often used to present
financial information. E.g., A Company’s expenditure can be shown to be the sum of its parts
including different expense categories such as salaries, borrowing interest, taxation and general
running costs (i.e., rent, electricity, heating etc.).

A pie chart is a circular chart in which the circle is divided into sectors. Each sector visually
represents an item in a data set to match the amount of the item as a percentage or fraction of the
total data set.

Illustration: A family's weekly expenditure on its house mortgage, food and fuel is as follows:

expenditure Ksh
Mortgage 300
Food 225
Fuel 75

TASK 5

Draw a pie chart to display the information.

3.5 Graphs

A graph is a visual representation of data by a continuous curve on a squared (graph) paper. Like
diagrams, graphs are also attractive, and eye-catching, giving a bird's eye-view of data and
revealing their inner pattern.

Graphs of Frequency Distributions: -

The methods used to represent a grouped data are: -

1. Histogram

2. Frequency Polygon

3. Frequency Curve

4. Ogive or Cumulative Frequency Curve

3.5.1 Histogram

It is defined as a pictorial representation of a grouped frequency distribution by means of


adjacent rectangles, whose areas are proportional to the frequencies.

To construct a Histogram, the class intervals are plotted along the x-axis and corresponding
frequencies are plotted along the y - axis. The rectangles are constructed such that the height of
each rectangle is proportional to the frequency of the class and width is equal to the length of the
class. If all the classes have equal width, then all the rectangles stand on the equal width. In case
of classes having unequal widths, rectangles too stand on unequal widths (bases). For open-
classes, Histogram is constructed after making certain assumptions. As the rectangles are
adjacent leaving no gaps, the class-intervals become of the inclusive type, adjustment is
necessary for end points only.

For example, in a book sale, you want to determine which books were most popular, the high-
priced books, the low-priced books, books most neglected etc. Let us say you sold total 31 books
at this book-fair at the following prices.

$ 2, $ 1, $ 2, $ 2, $ 3, $ 5, $ 6, $ 17, $ 17, $ 7, $ 15, $ 7, $ 7, $ 18, $ 8, $ 10, $ 10, $ 9, $ 13, $

11, $ 12, $ 12, $ 12, $ 14, $ 16, $ 18, $ 20, $ 24, $ 21, $ 22, $ 25.

The books are ranging from $1 to $25. Divide this range into number of groups, class intervals.

Typically, there should not be fewer than 5 and more than 20 class-intervals are best for a
frequency Histogram.

Our first class-interval includes the lowest price of the data and, the last-interval of course
includes, the highest price. Also make sure that overlapping is avoided, so that, no one price falls
into two class-intervals. For example, you have class intervals as 0-5, 5-10, 10-15 and so on, then
the price $10 falls in both 5-10 and 10-15. Instead, if we use $1 - $5, $6=$10, the class-intervals
will be mutually exclusive.

Therefore, now we have distribution of books at a book-fair.

Class interval Frequency


1-5 6
6-10 8
11-15 10
16-20 3
21-25 4
Total Ʃf=31

Note that each class-interval is of equal width i.e., $5 inclusive. Now we draw the frequency
Histogram as below.
.

3.5.2 Frequency Distribution (Curve)

Frequency distribution curves are like frequency polygons. In frequency distribution, instead of
using straight line segments, a smooth curve is used to connect the points. The frequency curve
for the above data is shown as:

3.5.3 Ogives or Cumulative Frequency Curves

When frequencies are added, they are called cumulative frequencies. The curve obtained by
plotting cumulating frequencies is called a cumulative frequency curve or an ogive (pronounced
ojive).

To construct an Ogive: -

1) Add up the progressive totals of frequencies, class by class, to get the cumulative frequencies.
2) Plot classes on the horizontal (x-axis) and cumulative frequencies on the vertical (y-axis).

3) Join the points by a smooth curve.

Note that Ogives start at (i) zero on the vertical axis, and (ii) outside class limit of the last class.
In most of the cases it looks like 'S'.

Note that cumulative frequencies are plotted against the 'limits' of the classes to which they refer.

(A) Less than Ogive: - To plot a less than ogive, the data is arranged in ascending order of
magnitude and the frequencies are cumulated starting from the top. It starts from zero on the y-
axis and the lower limit of the lowest class interval on the x-axis.

(B) Greater than Ogive: - To plot this ogive, the data are arranged in the ascending order of
magnitude and frequencies are cumulated from the bottom. This curve ends at zero on the the y-
axis and the upper limit of the highest-class interval on the x-axis.

Illustrations: - On a graph paper, draw the two ogives for the data given below of the I.Q. of 160
students.

Class- 60-70 70-80 80-90 90- 100- 110- 120- 130- 140- 150-
interval 100 110 120 130 140 150 160
No. of 2 7 12 28 42 36 18 10 4 1
students
Uses of Ogive

Certain values like median, quartiles, deciles, quartile deviation, coefficient of skewness etc. can
be located using Ogives. It can be used to find the percentage of items having values less than
certain amount.

3.6 Stem and Leaf Diagram

A stem and leaf diagram provides a visual summary of your data. This diagram provides a partial
sorting of the data and allows you to detect the distributional pattern of the data.

There are three steps for drawing a stem and leaf diagram.
1. Split the data into two pieces, stem and leaf.

2. Arrange the stems from low to high.

3. Attach each leaf to the appropriate stem.

Example: Suppose you have the heights of 20 people as follows:

154, 143, 148, 139, 143, 147, 153, 162, 136, 147, 144, 143, 139, 142, 143, 156, 151, 164, 157,
149, and 146

What we have here is almost a stem and leaf diagram. Note that with the data written in this way
you can see what the modal class is (the one with the most values. You can also see the shape of
the distribution- most of the values are in the 140s with higher or lower values rarer.

To change this into a stem and leaf diagram, we just simplify it a little. Instead of writing out the
full figures each time (143, 143, 144, 143, ...) we write '14' and call this the 'stem' and then write
3, 3, 4, 3, ... (these being the 'leaves'). We would usually, however, write the leaves in order
(with the smallest first). Finally, we must also include a little key so that people know how to
interpret the diagram.

So, we finish up with:


ASSIGNMENT 3

(a) How many families are represented?

(b) Write down the mode of the distribution.

(c) Find, correct to the nearest whole number, the mean number of people in a family.

2.A marine biologist records as a frequency distribution the lengths (L), measured to the nearest
centimeter, of 100 mackerel. The results are given in the table below.

Length of mackerel Number of


(L cm) mackerel

27 < L≤29 2
29 < L ≤ 31 4
31 < L ≤ 33 8
31 < L ≤ 33 21
33 < L ≤ 35 30
37 < L ≤ 39 18
39 < L ≤ 41 12
41 < L ≤ 43 5
100

(a) Construct a cumulative frequency table for the data in the table.

(b) Draw a cumulative frequency curve.


3. The following table shows the age distribution of teachers who smoke at Fegi High School.

(a) Calculate an estimate of the mean smoking age.

(b) Construct a histogram to represent this data.

4. The following results give the heights of sunflowers in centimeters.

180, 184, 195, 177, 175, 173, 169, 167, 197, 173, 166, 183, 161, 195, 177, 192, 161, 165

Represent the data by a stem and leaf diagram.

5. The following stem and leaf diagram gives the heights in cm of 39 schoolchildren.

Stem Leaf
13 2, 3, 3, 5, 8
14 1, 1, 1,4, 5, 5, 9
15 3, 4, 4, 6, 6, 7, 7, 7, 8, 9, 9
16 1, 2, 2, 5, 6, 6, 7, 8, 8
17 4, 4, 4, 5, 6, 6
18 0

KEY
13 2, represents 132cm

(a) (i) State the lower quartile height,

(ii) State the median height


(iii) State the upper quartile height.

You might also like