Computer and Statistics
Computer and Statistics
Computer and Statistics
Website: www.jptsonline.org
TABLE OF CONTENTS
CHAPTER ONE: INTRODUCTION TO COMPUTER
CHAPTER TWO: CLASSIFICATION OF COMPUTER
CHAPTER THREE: APPLICATION OF COMPUTER
CHAPTER FOUR: ADVANTAGES OF COMPUTER
CHAPTER FIVE: LMITATIONS OF COMPUTER
CHAPTER SIX: COMPONENTS OF A COMPUTER SYSTEM
CHAPTER SEVEN: UNITS OF MEASUREMENT
2
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
CHAPTER ONE
INTRODUCTION TO COMPUTER
Today, almost all of us in the world make use of computers in one way or the
other. It finds applications in various fields of engineering, medicine, commercial,
research and others. Not only in these sophisticated areas, but also in our daily
lives, computers have become indispensable. They are present everywhere, in all
the devices that we use daily like cars, games, washing machines, microwaves etc.
and in day to day computations like banking, reservations, electronic mails,
internet and many more.
The word computer is derived from the word compute. Compute means to
calculate. The computer was originally defined as a superfast calculator. It had the
capacity to solve complex arithmetic and scientific problems at very high speed.
But nowadays, in addition to handling complex arithmetic computations,
computers perform many other tasks like accepting, sorting, selecting, moving,
comparing various types of information. They also perform arithmetic and logical
operations on alphabetic, numeric and other types of information. This
information provided by the user to the computer is data. The information in one
form which is presented to the computer is the input information or input data.
Therefore, a computer can now be defined as a fast and accurate data processing
system that accepts data, performs various operations on the data, has the
capability to store the data and produce the results on the basis of detailed step
by step instructions given to it.
The terms hardware and software are almost always used in connection with the
computer.
3
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
• The Hardware:
The hardware is the machinery itself. It is made up of the physical parts or devices
of the computer system like the electronic Integrated Circuits (ICs), magnetic
storage media and other mechanical devices like input devices, output devices
etc. All these various hardware are linked together to form an effective functional
unit. The various types of hardware used in the computers, has evolved from
vacuum tubes of the first generation to Ultra Large Scale Integrated Circuits of the
present generation.
• The Software:
The computer hardware itself is not capable of doing anything on its own; it has
to be given explicit instructions to perform the specific task. The computer
program is the one which controls the processing activities of the computer. The
computer thus functions according to the instructions written in the program.
4
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
Software Types
5
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
The computers of the first generation were very bulky and emitted large amount
of heat which required air conditioning. They were large in size and cumbersome
to handle. They had to be manually assembled and had limited commercial use.
The concept of operating systems was not known at that time. Each computer
had a different binary coded program called a machine language that told it how
to operate.
The Abacus, which emerged about 5000 years ago in Asia Minor and is still in use
today, allows users to make computations using a system of sliding beads
arranged on a rack. Early merchants used Abacus to keep trading transactions.
7
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
electronic relay computer. Electromagnetic signals were used for the movement
of mechanical parts. Mark I could perform the basic arithmetic and complex
equations. Although this machine was extremely reliable, it was very slow (it took
about 3-5 seconds per calculation) and was complex in design and large in size.
EDVAC –
In the mid 1940’s Dr. John von Neumann designed the Electronic Discrete
Variable Automatic Computer with a memory to store both program and data.
This was the first machine which used the stored program concept. It had five
distinct units - arithmetic, central control, memory, input and output. The key
element was the central control. All the functions of the computer were co-
ordinate through this single source, the central control. The programming of the
computers was done in machine language
UNIVAC –
Remington Rand designed this computer specifically for business data processing
applications. The Universal Automatic Computer was the first general purpose
commercially available computer.
8
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
IBM 1401 was universally accepted throughout the industry and most large
businesses routinely processed financial information using second generation
computers. The machine language was replaced by assembly language. Thus the
long and difficult binary code was replaced with abbreviated programming code
which was relatively easy to understand.
The stored program concept and programming languages gave the computers
flexi bility to finally be cost effective and productive for business use. The stored
program concept implied that the instructions to run a computer for a specific
task were held inside the computer’s memory and could quickly be modified or
replaced by a different set of instructions for a different function. High level
languages like COBOL, FORTRAN and AL- GOL were dev eloped. Computers
started finding vast and varied applications. The entire software industry began
with the second generation computers.
9
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
The third generation computers made use of ‘Integrated Circuits that had 10-20
components on each chip, this was Small Scale Integration (SSI).
The Fourth Generation realized Large Scale Integration (LSI) which could fit
hundreds of components on one chip and Very Large Scale integration (VLSI)
which squeezed thousands of components on one chip. The Intel 4004 chip,
located all the components of a computer (central processing unit, memory, input
and output controls) on a single chip and microcomputers were introduced.
Higher capacity storage media like magnetic disks were developed. Fourth
generation languages emerged and applications softwares started becoming
popular.
10
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
As the computers started becoming more and more powerful, they could be
linked together or networked to share not only data but also memory space and
software. The networks could reach enormous proportions with local area
networks. A global web of computer circuitry, the Internet, links the computers
worldwide into a single network of information.
Defining the fifth generation computers is somewhat difficult because the field is
still in its infancy. The computers of tomorrow would be characterized by Artificial
Intelligence (AI). An example of Al is Expert Systems. Computers could be
developed which could think and reason in much the same way as humans.
Computers would be able to accept spoken words as input (voice recognition).
Many advances in the science of computer design and technology are coming
together to enable the creation of fifth generation computers. Two such advances
are parallel processing where many CPUs work as one and advance in
superconductor technology which allows the flow of electricity with little or no
resistance, greatly improving the speed of information flow.
11
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
CHAPTER TWO
CLASSIFICATION OF COMPUTERS
Computers are broadly classified into two categories depending upon the logic
used in their design as:
Analog computers:
In analog computers, data is recognized as a continuous measurement of a
physical property like voltage, speed, pressure etc. Readings on a dial or graphs
are obtained as the output, ex. Voltage, temperature; pressure can be measured
in this way.
Digital Computers:
These are high speed electronic devices. These devices are programmable. They
process data by way of mathematical calculations, comparison, sorting etc. They
accept input and produce output as discrete signals representing high (on) or low
(off) voltage state of electricity. Numbers, alphabets, symbols are all represented
as a series of 1s and Os.
Digital Computers are further classified as General Purpose Digital Computers and
Special Purpose Digital Computers. General Purpose computers can be used for
any applications like accounts, payroll, data processing etc. Special purpose
computers are used for a specific job like those used in automobiles, microwaves
etc.
Another classification of digital computers is done on the basis of their capacity to
access memory and size like:
• Small Computers:
I) Microcomputers: Microcomputers are generally referred to as Personal
Computers (PCs). They have smallest memory and less power. They are
widely used in day to day applications like office automation, and
professional applications, ex. PCAT, Pentium etc.
II) Note Book and Laptop Computers: These are portable in nature and are
battery operated. Storage devices like CDs, floppies etc. and output devices
like printers can be connected to these computers. Notebook computers
12
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
are smaller in physical size than laptop computers. However, both have
powerful processors, support graphics, and can accept mouse driven input.
III) Hand Held Computers
These types of computers are mainly used in applications like collection of
field data. They are even smaller than the note book computers.
13
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
CHAPTER THREE
APPLICATIONS OF COMPUTERS
Today computers find widespread applications in all activities of the modern
world. Some of the major application areas include:
Business:
Record keeping, budgets, reports, inventory, payroll, invoicing, accounts are all
the areas of business and industry where computers are used to a great extent.
Database management is one of the major area where computers are used on a
large scale. The areas of application here include banking, airline reservations,
etc. where large amounts of data need to be updated, edited, sorted, searched
from large databases.
Medicine:
Computerized systems are now in widespread use in monitoring patient data like
pulse rate, blood pressure etc. resulting in faster and accurate diagnosis. Modern
day medical equipment are highly computerized today. Computers are also widely
used in medical research.
Information:
This is the age of information. Television, Satellite communication, Internet,
networks are all based on computers.
Education:
The use of computers in education is increasing day by day. The students develop
the habit of thinking more logically and are able to formulate problem solving
techniques. CDs on a variety of subjects are available to impart education. On line
training programs for students are also becoming popular day by day. All the
major encyclopedias, dictionaries and books are now available in the digital form
14
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
and therefore are easily accessible to the student of today. Creativity in drawing,
painting, designing, decoration, music etc. can be well developed with computers.
15
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
CHAPTER FOUR
ADVANTAGES OF COMPUTERS
Speed:
The speed of a computer is measured in terms of the number of instructions that
it can perform or execute in a second. The speeds of computers are measured in
milliseconds (10~3 sec), micro-seconds (10*6 sec), and nano-seconds (10~9sec).
Computers are superfast machines and can process millions of instructions per
second. Smaller computers can execute thousands of instructions per second,
while the more complex machines can execute millions of instructions per
second.
Accuracy:
Computers are very accurate. They are capable of executing hundreds of
instructions without any errors. They do not make mistakes in their computations.
They perform each and every calculation with the same accuracy.
Efficiency
The efficiency of computers does not decrease with age. The computers can
perform repeated tasks with the same efficiency any number of times without
exhausting there selves. Even if they are instructed to execute millions of
instructions, they are capable of executing them all with the same speed and
efficiency without exhaustion.
Storage Capability
Computers are capable of storing large amounts of data in their storage devices.
These dev ices occupy very less space and can store millions of characters in
condensed forms. These storage devices typically include floppy disks, tapes, hard
disks, CDs etc, the data stored on these devices can be retrieved and reused
whenever it is required in future
Versatility
Computers are very versatile. They are capable not only of performing complex
mathematical tasks of science and engineering, but also other non-numerical
operations fielding air-line reservation, electricity bills, data base management
etc.
16
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
CAHPTER FIVE
LIMITATIONS OF COMPUTERS
Although the computers of today are highly intelligent and sophisticated they
have their own limitations. The computer cannot think on its own, since it does
not have its own brain. It can only do what is has been programmed to do. It can
execute only those jobs that can be expressed as a finite set of instructions to
achieve a specific goal. Each of the steps has to be clearly defined. The computers
do not learn from previous experience nor can they arrive at a conclusion without
going through all the intermediate steps. However the impact of computers on
today’s society in phenomenal and they are today an important part of the
society.
A COMPUTER SYSTEM
Any system is defined as a group of integrated parts which are designed to
achieve a common objective. Thus, a system is made up of more than one
element or part, where each element performs a specific function and where all
the elements (parts) are logically related and are controlled in such a way that the
goal (purpose) of the system is achieved.
Each of these units performs a specific task. However, none of them can function
independently on their own. They are logically related and controlled to achieve a
specific goal. When they are thus integrated they form a fully-fledged computer
system.
17
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
CHAPTER SIX
COMPONENTS OF A COMPUTER SYSTEM
The basic parts of computer system are:
Input Unit
The Central Processing Unit
Output Unit
18
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
19
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
Output Unit:
The output devices give the results of the process and computations to the
outside world. The output units accept the results produced by the computer,
convert them into a human readable form and supply them to the users. The
more common output devices are printers, plotters, display screens, magnetic
tape drives etc.
20
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
CHAPTER SEVEN
UNITS OF MEASUREMENT
Unit of Measurements
Storage Measurements: The basic unit used in computer data storage is called a
bit (binary digit). Computers use these little bits, which are composed of ones and
zeros, to do things and talk to other computers. All your files, for instance, are
kept in the computer as binary files and translated into words and pictures by the
software (which is also ones and zeros). This ‘two-number’ system is called a
“binary number system” since it has only two numbers in it. The decimal number
system in contrast has ten unique digits, zero through nine.
Bit BIT 0 or 1
Kilobyte KB 1024 Byte
Megabyte MB 1024 Kilobyte
Gigabyte GB 1024 Megabyte
Terabyte TB 1024 Gigabyte
Size example
• 1 bit - answer to an yes/no question
• 1 byte - a number from 0 to 255.
• 90 bytes: enough to store a typical line of text from a book.
• 4 KB: about one page of text.
• 120 KB: the text of a typical pocket book.
• 3 MB - a three minute song (128k bitrate)
• 650-900 MB - an CD-ROM
• 1 GB -114 minutes of uncompressed CD-quality audio at 1.4 Mbit/s
• 8-16 GB - size of a normal flash drive
21
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
22
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
STATISTICS
23
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
TABLE OF CONTENTS
CHAPTER ONE: DEFINITION, SCOPE AND LIMITATIONS
CHAPTER TWO: INTRODUCTION TO SAMPLING METHODS
CHAPTER THREE: COLLECTION OF DATA: CLASSIFICATION AND TABULATION
CHAPTER FOUR: FREQUENCY DISTRIBUTION
CHAPTER FIVE: DIAGRAMMATIC AND GRAPHICAL REPRESENTATION
CHAPTER SIX: MEASURE OF CENTRAL TENDENCY
CHAPTER SEVEN: MEASURE OF DISPERSION: SKEWNESS AND KURTOSIS
CHAPTER EIGHT: CORRELATION
CHAPTER NINE: REGRESSION
CHAPTER TEN: INDEX NUMBERS
24
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
CHAPTER ONE
DEFINITIONS, SCOPE AND LIMITATIONS
1.1 Introduction
In the modern world of computers and information technology, the importance of
statistics is very well-recognized by all the disciplines. Statistics has originated as a
science of statehood and found applications slowly and steadily in Agriculture,
Economics, Commerce, Biology, Medicine, Industry, planning, education and so
on. As of date, there is no other human walk of life, where statistics cannot be
applied.
1.4 Definitions
Statistics is defined differently by different authors over a period of time. In the
olden days statistics was confined to only state affairs but in modern days it
embraces almost every sphere of human activity. Therefore, a number of old
definitions, which was confined to narrow field of enquiry were replaced by more
25
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
Statistics may be called the science of counting in one of the departments due to
Bowley, obviously this is an incomplete definition as it takes into account only the
aspect of collection and ignores other aspects such as analysis, presentation and
interpretation.
Bowley gives another definition for statistics, which states ‘statistics may be
rightly called the scheme of averages’. This definition is also incomplete, as
averages play an important role in understanding and comparing data and
statistics provide more measures.
1. Collection of Data: It is the first step and this is the foundation upon which
the entire data set. Careful planning is essential before collecting the data. There
26
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
1.5.1 Condensation:
Generally speaking, by the word ‘to condense’, we mean to reduce or to lessen.
Condensation is mainly applied at embracing the understanding of a huge mass of
data by providing only few observations. If in a particular class in Chennai School,
only marks in an examination are given, no purpose will be served. Instead if we
are given the average mark in that particular examination, definitely it serves the
better purpose. Similarly, the range of marks is also another measure of the data.
Thus, Statistical measures help to reduce the complexity of the data and
consequently to understand any huge mass of data.
27
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
1.5.2 Comparison
Classification and tabulation are the two methods that are used to condense the
data. They help us to compare data collected from different sources. Grand totals,
measures of central tendency measures of dispersion, graphs and diagrams,
coefficient of correlation etc. provide ample scope for comparison.
If we have one group of data, we can compare within itself. If the rice production
(in Tonnes) in Tanjore district is known, then we can compare one region with
another region within the district. Or if the rice production (in Tonnes) of two
different districts within Tamilnadu is known, then also a comparative study can
be made. As statistics is an aggregate of facts and figures, comparison is always
possible and in fact comparison helps us to understand the data in a better way.
1.5.3 Forecasting:
By the word forecasting, we mean to predict or to estimate beforehand. Given
the data of the last ten years connected to rainfall of a particular district in
Tamilnadu, it is possible to predict or forecast the rainfall for the near future. In
business also forecasting plays a dominant role in connection with production,
sales, profits etc. The analysis of time series and regression analysis plays an
important role in forecasting.
1.5.4 Estimation:
One of the main objectives of statistics is drawn inference about a population
from the analysis for the sample drawn from that population. The four major
branches of statistical inference are
1. Estimation theory
2. Tests of Hypothesis
3. Non-Parametric tests
4. Sequential analysis
28
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
29
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
As so many multinational companies have invaded into our Indian economy, the
size and volume of business is increasing. On one side the stiff competition is
increasing whereas on the other side the tastes are changing and new fashions
are emerging. In this connection, market survey plays an important role to exhibit
the present conditions and to forecast the likely changes in future.
For an example, five fertilizers are applied to five plots each of wheat and the
yields of wheat on each of the plots are given. In such a situation, we are
interested in finding out whether the effect of these fertilizers on the yield is
significantly different or not. In other words, whether the samples are drawn from
the same normal population or not. The answer to this problem is provided by the
technique of ANOVA and it is used to test the homogeneity of several population
means.
Alfred Marshall said, “Statistics are the straw only which I like every other
economist have to make the bricks”. It may also be noted that statistical data and
techniques of statistical tools are immensely useful in solving many economic
problems such as wages, prices, production, distribution of income and wealth
and so on. Statistical tools like Index numbers, time series Analysis, Estimation
theory, Testing Statistical Hypothesis are extensively used in economics.
30
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
In order to achieve the above goals, the statistical data relating to production,
consumption, demand, supply, prices, investments, income expenditure etc and
various advanced statistical techniques for processing, analysing and interpreting
such complex data are of importance. In India statistics, play an important role in
planning, commissioning both at the central and state government levels.
SYSTAT, a software package offers mere scientific and technical graphing options
than any other desktop statistics package. SYSTAT supports all types of scientific
and technical research in various diversified fields as follows
31
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
32
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
33
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
CHAPTER TWO
INTRODUCTION TO SAMPLING METHODS
2.1 Introduction
Sampling is very often used in our daily life. For example, while purchasing food
grains from a shop we usually examine a handful from the bag to assess the
quality of the commodity. A doctor examines a few drops of blood as sample and
draws conclusion about the blood constitution of the whole body. Thus, most of
our investigations are based on samples. In this chapter, let us see the importance
of sampling and the various methods of sample selections from the population.
2.2 Population
In a statistical enquiry, all the items, which fall within the purview of enquiry, are
known as Population or Universe. In other words, the population is a complete
set of all possible observations of the type which is to be investigated. Total
numbers of students studying in a school or college, total number of books in a
library, total number of houses in a village or town are some examples of
population.
34
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
Limitations
1. It requires a large number of enumerators and it is a costly method
2. It requires more money, labour, time energy etc.
3. It is not possible in some circumstances where the universe is infinite.
2.3 Sampling
The theory of sampling has been developed recently but this is not new. In our
everyday life we have been using sampling theory as we have discussed in
introduction. In all those cases we believe that the samples give a correct idea
about the population. Most of our decisions are based on the examination of a
few items that is sample studies.
35
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
2.3.1 Sample
Statisticians use the word sample to describe a portion chosen from the
population. A finite subset of statistical individuals defined in a population is
called a sample. The number of units in a sample is called the sample size.
Sampling unit
The constituents of a population which are individuals to be sampled from the
population and cannot be further subdivided for the purpose of the sampling at a
time are called sampling units. For example, to know the average income per
family, the head of the family is a sampling unit. To know the average yield of rice,
each farm owner’s yield of rice is a sampling unit.
Sampling frame
For adopting any sampling procedure it is essential to have a list identifying each
sampling unit by a number. Such a list or map is called sampling frame. A list of
voters, a list of house holders, a list of villages in a district, a list of farmers etc. are
a few examples of sampling frame.
2.3.2 Reasons for selecting a sample
Sampling is inevitable in the following situations:
36
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
In general, we use Greek or capital letters for population parameters and lower
case Roman letters to denote sample statistics. [N, µ, σ, are the standard symbols
for the size, mean, S.D, of population. n , x , s, are the standard symbol for the
size, mean, s.d of sample respectively].
Other things being equal, as the sample size increases, the results tend to be
more accurate and reliable.
3. Principle of Validity
This states that the sampling methods provide valid estimates about the
population units (parameters).
4. Principle of Optimization
This principle takes into account the desirability of obtaining a sampling design
which gives optimum results. This minimizes the risk or loss of the sampling
design.
37
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
1. Sampling errors
Although a sample is a part of population, it cannot be expected generally to
supply full information about population. So, there may be in most cases
difference between statistics and parameters. The discrepancy between a
parameter and its estimate due to sampling process is known as sampling error.
2. Non-sampling errors:
In all surveys, some errors may occur during collection of actual information.
These errors are called Non-sampling errors.
38
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
1. Probability sampling.
2. Non-probability sampling.
3. Mixed sampling.
2.4.1Probability sampling (Random Sampling)
A probability sample is one where the selection of units from the population is
made according to known probabilities. (eg.) Simple random sample, probability
proportional to sample size etc.
39
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
a) Lottery Method:
This is the most popular and simplest method. In this method, all the items of the
population are numbered on separate slips of paper of same size, shape and
colour. They are folded and mixed up in a container. The required numbers of
slips are selected at random for the desire sample size. For example, if we want to
select 5 students, out of 50 students, then we must write their names or their roll
numbers of all the 50 students on slips and mix them. Then we make a random
selection of 5 students.
This method is mostly used in lottery draws. If the universe is infinite this method
is inapplicable.
1. Tippett’s table
2. Fisher and Yates’ table
3. Kendall and Smith’s table are the three tables among them.
A random number table is so constructed that all digits 0 to 9 appear independent
of each other with equal frequency. If we have to select a sample from population
of size N = 100, then the numbers can be combined three by three to give the
numbers from 001 to 100.
digit number 000,001,002, ….. 999 are assigned. We may start at any place and
may go on in any direction such as column wise or row- wise in a random number
table. But consecutive numbers are to be used.
On the basis of the size of the population and the random number table available
with us, we proceed according to our convenience. If any random number is
greater than the population size N, then N can be subtracted from the random
number drawn.
203 023 277 353 100 294 109 179 272 284 450 141 148
408 280
41
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
42
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
The population size is denoted by N and the sample size is denoted by ‘n’ the
sample size is allocated to each stratum in such a way that the sample fractions is
a constant for each stratum. That is given by n/N = c. So in this method each
stratum is represented according to its size.
n1 = n × N1 = 50 × 200 = 20
N 500
n2 = n × N2 = 50 × 300 = 30
N 500
The sample sizes are 20 from A and 30 from B. Then the units from each
institution are to be selected by simple random sampling.
43
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
Limitations
1. To divide the population into homogeneous strata, it requires more
money, time and statistical experience which is a difficult one.
2. Improper stratification leads to bias, if the different strata overlap such a
sample will not be a representative one.
2.5.3 Systematic Sampling:
This method is widely employed because of its ease and convenience. A
frequently used method of sampling when a complete list of the population is
available is systematic sampling.
It is also called Quasi-random sampling.
Selection Procedure
The whole sample selection is based on just a random start. The first unit is
selected with the help of random numbers and the rest get selected automatically
according to some pre designed pattern is known as systematic sampling. With
systematic random sampling every Kth element in the frame is selected for the
sample, with the starting point among the first K elements determined at random.
44
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
Limitations
1. Systematic sampling may not represent the whole population.
45
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
CHAPTER THREE
COLLECTION OF DATA:
CLASSIFICATION AND TABULATION
3.1 Introduction
Everybody collects, interprets and uses information, much of it in a numerical or
statistical forms in day-to-day life. It is a common practice that people receive
large quantities of information everyday through conversations, televisions,
computers, the radios, newspapers, posters, notices and instructions. It is just
because there is so much information available that people need to be able to
absorb, select and reject it. In everyday life, in business and industry, certain
statistical information is necessary and it is independent to know where to find it
how to collect it. As consequences, everybody has to compare prices and quality
before making any decision about what goods to buy. As employees of any firm,
people want to compare their salaries and working conditions, promotion
opportunities and so on. In time the firms on their part want to control costs and
expand their profits.
One of the main functions of statistics is to provide information which will help on
making decisions. Statistics provides the type of information by providing a
description of the present, a profile of the past and an estimate of the future. The
following are some of the objectives of collecting statistical information.
46
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
analysis and using these results for making judgements, decisions and predictions.
The validity and accuracy of final judgement is most crucial and depends heavily
on how well the data was collected in the first place. The quality of data will
greatly affect the conditions and hence at most importance must be given to this
process and every possible precaution should be taken to ensure accuracy while
collecting the data.
3.2 Nature of data:
It may be noted that different types of data can be collected for different
purposes. The data can be collected in connection with time or geographical
location or in connection with time and location. The following are the three
types of data:
Example 1:
The following is the data for the three types of expenditures in rupees for a family
for the four years 2001,2002,2003,2004.
47
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
If the data collected is connected with that of a place, then it is termed as spatial
data. For example, the data may be
State Population
Tamilnadu 5,56,38,318
Andhra Pradesh 6,63,04,854
Karnataka 4,48,17,398
Kerala 2,90,11,237
Pondicherry 7,89,416
State Population
1981 1991
Tamil Nadu 4,82,97,456 5,56,38,318
Andhra Pradesh 5,34,03,619 6,63,04,854
Karnataka 3,70,43,451 4,48,17,398
Kerala 2,54,03,217 2,90,11,237
Pondicherry 6,04,136 7,89,416
48
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
Example 4
If a researcher is interested to know the impact of noonmeal scheme for the
school children, he has to undertake a survey and collect data on the opinion of
parents and children by asking relevant questions. Such a data collected for the
purpose is called primary data.
49
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
Limitations
1. It is very costly and time consuming.
2. It is very difficult, when the number of persons to be interviewed is large
and the persons are spread over a wide area.
3. Personal prejudice and bias are greater under this method.
2. Indirect Oral Interviews:
Under this method the investigator contacts witnesses or neighbours or friends or
some other third parties who are capable of supplying the necessary information.
This method is preferred if the required information is on addiction or cause of
fire or theft or murder etc., If a fire has broken out a certain place, the persons
living in neighbourhood and witnesses are likely to give information on the cause
of fire. In some cases, police interrogated third parties who are supposed to have
knowledge of a theft or a murder and get some clues. Enquiry committees
appointed by governments generally adopt this method and get people’s views
and all possible details of facts relating to the enquiry. This method is suitable
whenever direct sources do not exist or cannot be relied upon or would be
unwilling to part with the information.
The validity of the results depends upon a few factors, such as the nature of the
person whose evidence is being recorded, the ability of the interviewer to draw
out information from the third parties by means of appropriate questions and
cross examinations, and the number of persons interviewed. For the success of
this method one person or one group alone should not be relied upon.
3. Information from correspondents
50
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
51
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
with the investigation. It should assure that the information’s would be kept
confidential and would never be misused. It may promise a copy of the findings
or free gifts or concessions etc.,
Merits
52
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
Limitations
53
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
4. The information collected for primary data is more reliable than those
collected from the secondary data.
3.3.2 Secondary Data:
Secondary data are those data which have been already collected and analyzed by
some earlier agency for its own use; and later the same data are used by a
different agency. According to W.A. Neiswanger, ‘A primary source is a
publication in which the data are published by the same authority which gathered
and analyzed them. A secondary source is a publication, reporting the data which
have been gathered by other authorities and for which others are responsible’.
1. Published Sources:
The various sources of published data are:
54
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
It should be noted that the publications mentioned above vary with regard to the
periodically of publication. Some are published at regular intervals (yearly,
monthly, weekly etc.,) whereas others are ad hoc publications, i.e., with no
regularity about periodicity of publications.
Note: A lot of secondary data is available in the internet. We can access it at any
time for the further studies.
2. Unpublished Sources
All statistical material is not always published. There are various sources of
unpublished data such as records maintained by various Government and private
offices, studies made by research institutions, scholars, etc. Such sources can also
be used where necessary
55
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
3.4 Classification:
The collected data, also known as raw data or ungrouped data are always in an
unorganized form and need to be organized and presented in meaningful and
readily comprehensible form in order to facilitate further statistical analysis. It is,
therefore, essential for an investigator to condense a mass of data into more and
more comprehensible and assimilable form. The process of grouping into
different classes or sub classes according to some characteristics is known as
classification, tabulation is concerned with the systematic arrangement and
presentation of classified data. Thus, classification is the first step in tabulation.
For Example, letters in the post office are classified according to their destinations
viz., Delhi, Madurai, Bangalore, Mumbai etc.,
Objects of Classification
The following are main objectives of classifying the data:
56
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
Types of Classification
Statistical data are classified in respect of their
characteristics. Broadly there are four basic types of classification namely
a) Chronological classification
b) Geographical classification
c) Qualitative classification
d) Quantitative classification
a) Chronological Classification
In chronological classification, the collected data are arranged according to the
order of time expressed in years, months, weeks, etc., The data is generally
classified in ascending order of time. For example, the data related with
population, sales of a firm, imports and exports of a country are always subjected
to chronological classification.
Example 5:
The estimates of birth rates in India during 1970 – 76 are
Year 1970 1971 1972 1973 1974 1975 1976
Birth Rate 36.8 36.9 36.6 34.6 34.5 35.2 34.2
b) Geographical Classification
In this type of classification, the data are classified according to geographical
region or place. For instance, the production of paddy in different states in India,
production of wheat in different countries etc., Example 6:
c) Qualitative Classification
In this type of classification data are classified on the basis of same attributes or
quality like sex, literacy, religion, employment etc. Such attributes cannot be
measured along with a scale.
57
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
For example, if the population to be classified in respect to one attribute, say sex,
then we can classify them into two namely that of males and females. Similarly,
they can also be classified into ‘employed’ or ‘unemployed’ on the basis of
another attribute ‘employment’.
Thus, when the classification is done with respect to one attribute, which is
dichotomous in nature, two classes are formed, one possessing the attribute and
the other not possessing the attribute. This type of classification is called simple
or dichotomous classification.
Population
Male Female
The classification, where two or more attributes are considered and several
classes are formed, is called a manifold classification. For example, if we classify
population simultaneously with respect to two attributes, e.g sex and
employment, then population are first classified with respect to ‘sex’ into ‘males’
and ‘females’. Each of these classes may then be further classified into
‘employment’ and ‘unemployment’ on the basis of attribute ‘employment’ and as
such Population are classified into four classes namely.
Male Female
58
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
d) Quantitative classification
Quantitative classification refers to the classification of data according to some
characteristics that can be measured such as height, weight, etc., For example the
students of a college may be classified according to weight as given below.
In this type of classification there are two elements, namely (i) the variable (i.e)
the weight in the above example, and (ii) the frequency in the number of students
in each class. There are 50 students having weights ranging from 90 to 100 lb, 200
students having weight ranging between 100 to 110 lb and so on.
3.5 Tabulation
Tabulation is the process of summarizing classified or grouped data in the form of
a table so that it is easily understood and an investigator is quickly able to locate
the desired information. A table is a systematic arrangement of classified data in
columns and rows. Thus, a statistical table makes it possible for the investigator
to present a huge mass of data in a detailed and orderly form. It facilitates
comparison and often reveals certain patterns in data which are otherwise not
obvious. Classification and ‘Tabulation’, as a matter of fact, are not two distinct
processes. Actually, they go together. Before tabulation data are classified and
then displayed under different columns and rows of a table.
Advantages of Tabulation:
Statistical data arranged in a tabular form serve following objectives:
59
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
1. It simplifies complex data and the data presented are easily understood.
2. It facilitates comparison of related facts.
3. It facilitates computation of various statistical measures like averages,
dispersion, correlation etc.
4. It presents facts in minimum possible space and unnecessary repetitions
and explanations are avoided. Moreover, the needed information can be
easily located.
5. Tabulated data are good for references and they make it easier to present
the information in the form of graphs and diagrams.
Preparing a Table
The making of a compact table itself an art. This should contain all the
information needed within the smallest possible space. What the purpose of
tabulation is and how the tabulated information is to be used are the main points
to be kept in mind while preparing for a statistical table. An ideal table should
consist of the following main parts:
1. Table number
2. Title of the table
3. Captions or column headings
4. Stubs or row designation
5. Body of the table
6. Footnotes
7. Sources of data
Table Number
A table should be numbered for easy reference and identification. This number, if
possible, should be written in the centre at the top of the table. Sometimes it is
also written just before the title of the table.
Title
A good table should have a clearly worded, brief but unambiguous title explaining
the nature of data contained in the table. It should also state arrangement of data
and the period covered. The title should be placed centrally on the top of a table
just below the table number (or just after table number in the same line).
Captions or Column Headings
60
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
Headings Body
Sub
Total
Foot notes:
Sources Note:
Body:
The body of the table contains the numerical information of frequency of
observations in the different cells. This arrangement of data is according to the
description of captions and stubs.
Footnotes
61
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
Footnotes are given at the foot of the table for explanation of any fact or
information included in the table which needs some explanation. Thus, they are
meant for explaining or providing further details about the data, that have not
been covered in title, captions and stubs.
Sources of data
Lastly one should also mention the source of information from which data are
taken. This may preferably include the name of the author, volume, page and the
year of publication. This should also state whether the data contained in the table
is of ‘primary or secondary’ nature.
Requirements of a Good Table
A good statistical table is not merely a careless grouping of columns and rows but
should be such that it summarizes the total information in an easily accessible
form in minimum possible space. Thus while preparing a table, one must have a
clear idea of the information to be presented, the facts to be compared and he
points to be stressed.
Though, there is no hard and fast rule for forming a table yet a few general point
should be kept in mind:
62
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
7. The rows and columns are separated by single, double or thick lines to
represent various classes and sub-classes used. The corresponding
proportions or percentages should be given in adjoining rows and columns
to enable comparison. A vertical expansion of the table is generally more
convenient than the horizontal one.
8. The averages or totals of different rows should be given at the right of the
table and that of columns at the bottom of the table. Totals for every sub-
class too should be mentioned.
9. In case it is not possible to accommodate all the information in a single
table, it is better to have two or more related tables.
Type of Tables
Tables can be classified according to their purpose, stage of enquiry, nature of
data or number of characteristics used. On the basis of the number of
characteristics, tables may be classified as follows:
3. Manifold table
Total
Two-way Table
A table, which contains data on two characteristics, is called a two-way table. In
such case, therefore, either stub or caption is divided into two co-ordinate parts.
In the given table, as an example the caption may be further divided in respect of
63
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
‘sex’. This subdivision is shown in two-way table, which now contains two
characteristics namely, occupation and sex.
Total
Manifold Table
Thus, more and more complex tables can be formed by including other
characteristics. For example, we may further classify the caption sub-headings in
the above table in respect of “marital status”, “religion” and “socio-economic
status” etc. A table, which has more than two characteristics of data is considered
as a manifold table. For instance, table shown below shows three characteristics
namely, occupation, sex and marital status.
Total
Foot note: M Stands for Married and U stands for unmarried.
Manifold tables, though complex are good in practice as these enable full
information to be incorporated and facilitate analysis of all related facts. Still, as a
normal practice, not more than four characteristics should be represented in one
table to avoid confusion. Other related tables may be formed to show the
remaining characteristics
64
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
CHAPTER FOUR
FREQUENCY DISTRIBUTION
4.1 Introduction
Frequency distribution is a series when a number of observations with similar or
closely related values are put in separate bunches or groups, each group being in
order of magnitude in a series. It is simply a table in which the data are grouped
into classes and the numbers of cases which fall in each class are recorded. It
shows the frequency of occurrence of different values of a single Phenomenon.
The above figures are nothing but raw or ungrouped data and they are recorded
as they occur without any pre-consideration. This representation of data does
not furnish any useful information and is rather confusing to mind. A better way
to express the figures in an ascending or descending order of magnitude and is
commonly known as array. But this does not reduce the bulk of the data. The
above data when formed into an array is in the following form:
30 35 38 40 45 45 50 55 55 55
60 60 65 65 65 65 65 65 70 70
75 75 75 80 80 80 80 85 90 90
The array helps us to see at once the maximum and minimum values. It also
gives a rough idea of the distribution of the items over the range. When we have
65
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
a large number of items, the formation of an array is very difficult, tedious and
cumbersome. The Condensation should be directed for better understanding and
may be done in two ways, depending on the nature of the data. a) Discrete (or)
In this form of distribution, the frequency refers to discrete value. Here the data
are presented in a way that exact measurements of units are clearly indicated.
The process of preparing this type of distribution is very simple. We have just to
count the number of times a particular value is repeated, which is called the
frequency of that class. In order to facilitate counting, prepare a column of
tallies.
In another column, place all possible values of variable from the lowest to the
highest. Then put a bar (Vertical line) opposite the particular value to which it
relates.
To facilitate counting, blocks of five bars are prepared and some space is left
in between each block. We finally count the number of bars and get frequency.
Example 1
In a survey of 40 families in a village, the number of children per family was
recorded and the following data obtained.
1 0 3 2 1 5 6 2
2 1 0 3 4 2 1 6
3 2 1 5 3 3 2 4
2 2 3 0 2 1 4 5
3 3 4 4 1 2 4 5
66
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
Solution:
Frequency distribution of the number of children
50-100 4
100-150 12
150-200 22
200-250 33
250-300 16
300-350 8
350-400 5
Total 100
67
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
a) Class Limits
The class limits are the lowest and the highest values that can be included in the
class. For example, take the class 30-40. The lowest value of the class is 30 and
highest class is 40. The two boundaries of class are known as the lower limits and
the upper limit of the class. The lower limit of a class is the value below which
there can be no item in the class. The upper limit of a class is the value above
which there can be no item to that class. Of the class 60-79, 60 is the lower limit
and 79 is the upper limit, i.e. in the case there can be no value which is less than
60 or more than 79. The way in which class limits are stated depends upon the
nature of the data. In statistical calculations, lower class limit is denoted by L and
upper class limit by U.
b) Class Interval
The class interval may be defined as the size of each grouping of data. For
example, 50-75, 75-100, 100-125…are class intervals. Each grouping begins with
the lower limit of a class interval and ends at the lower limit of the next
succeeding class interval.
c) Width or size of the class interval:
The difference between the lower and upper class limits is called Width or size of
class interval and is denoted by ‘C’.
d) Range:
The difference between largest and smallest value of the observation is called
The Range and is denoted by ‘R’ i.e. R = Largest value – Smallest value
R=L–S
68
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
e) Mid-value or mid-point:
The central point of a class interval is called the mid value or mid-point. It is
found out by adding the upper and lower limits of a class and dividing the sum by
2.
= 25
f) Frequency
Number of observations falling within a particular class interval is called
frequency of that class.
Weight Number of
(in kgs) persons
30-40 25
40-50 53
50-60 77
60-70 95
70-80 80
80-90 60
90-100 30
Total 420
In the above example, the class frequency is 25, 53, 77, 95, 80, 60, 30. The total
frequency is equal to 420. The total frequency indicates the total number of
observations considered in a frequency distribution.
number of class intervals can vary from 5 to 15. To decide the number of class
intervals for the frequency distributive in the whole data, we choose the lowest
and the highest of the values. The difference between them will enable us to
decide the class intervals.
Thus the number of class intervals can be fixed arbitrarily keeping in view the
nature of problem under study or it can be decided with the help of Sturges’
Rule. According to him, the number of classes can be determined by the formula
K = 1 + 3. 322 log10 N
Thus if the number of observation is 10, then the number of class intervals is
= Range
1+3.322 log N10
70
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
a) Exclusive method
b) Inclusive method
c) Open-end classes
a) Exclusive method
When the class intervals are so fixed that the upper limit of one class is the lower
limit of the next class; it is known as the exclusive method of classification. The
following data are classified on this basis.
Expenditure (Rs.) No. of families
0 - 5000 60
5000-10000 95
10000-15000 122
15000-20000 83
20000-25000 40
Total 400
It is clear that the exclusive method ensures continuity of data as much as the
upper limit of one class is the lower limit of the next class. In the above example,
there are so families whose expenditure is between Rs.0 and Rs.4999.99. A
family whose expenditure is Rs.5000 would be included in the class interval 5000-
10000. This method is widely used in practice.
b) Inclusive method
In this method, the overlapping of the class intervals is avoided. Both the lower
and upper limits are included in the class interval. This type of classification may
be used for a grouped frequency distribution for discrete variable like members
in a family, number of workers in a factory etc., where the variable may take only
integral values. It cannot be used with fractional values like age, height, weight
etc.
Below 2000 7
2000 – 4000 5
4000 – 6000 6
6000 – 8000 4
8000 and above 3
72
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
42 62 46 54 41 37 54 44 32 45
47 50 58 49 51 42 46 37 42 39
54 39 51 58 47 64 43 48 49 48
49 61 41 40 58 49 59 57 57 34
56 38 45 52 46 40 63 41 51 41
Here the size of the class interval as per sturges rule is obtained as follows
73
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
Thus, the number of class interval is 7 and the size of each class is 5. The required
size of each class is 5. The required frequency distribution is prepared using tally
marks as given below:
Class Interval Tally marks Frequency
30-35 2
35-40 6
40-45 12
45-50 14
50-55 6
55-60 6
60-65 4
Total 50
Example 2:
Given below are the numbers of tools produced by workers in a factory.
43 18 25 18 39 44 19 20 20 26
40 45 38 25 13 14 27 41 42 17
34 31 32 27 33 37 25 26 32 25
33 34 35 46 29 34 31 34 35 24
28 30 41 32 29 28 30 31 30 34
31 35 36 29 26 32 36 35 36 37
32 23 22 29 33 37 33 27 24 36
23 42 29 37 29 23 44 41 45 39
21 21 42 22 28 22 15 16 17 28
22 29 35 31 27 40 23 32 40 37
Construct frequency distribution with inclusive type of class interval. Also, find.
74
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
= 1+ 3.322 log10100
= 7.6
=5
Hence, taking the magnitude of class intervals as 5, we have 7 classes 13-17, 18-
22… 43-47 are the classes by inclusive type. Using tally marks, the required
frequency distribution is obtained in the following table
0-10 3 6
10-20 8 16
20-30 12 24
30-40 17 34
40-50 6 12
50-60 4 8
Total 50 100
4.8 Cumulative Frequency Table
Cumulative frequency distribution has a running total of the values. It is
constructed by adding the frequency of the first-class interval to the frequency of
the second-class interval. Again, add that total to the frequency in the third-class
interval continuing until the final total appearing opposite to the last class
interval will be the total of all frequencies. The cumulative frequency may be
downward or upward. A downward cumulation results in a list presenting the
number of frequencies “less than” any given amount as revealed by the lower
limit of succeeding class interval and the upward cumulative results in a list
presenting the number of frequencies “more than” and given amount is revealed
by the upper limit of a preceding class interval.
Example 3:
Age group Number of Less than Cumulative More than cumulative
(in years) women frequency frequency
15-20 3 3 64
20-25 7 10 61
25-30 15 25 54
30-35 21 46 39
35-40 12 58 18
76
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
40-45 6 64 6
Less than 20 3
Less than 25 10
Less than 30 25
Less than 35 46
Less than 40 58
Less than 45 64
15 and above 64
20 and above 61
25 and above 54
30 and above 39
35 and above 18
40 and above 6
4.8.1 Conversion of cumulative frequency to simple Frequency:
If we have only cumulative frequency ‘either less than or more than’, we can
convert it into simple frequencies. For example, if we have ‘less than Cumulative
frequency, we can convert this to simple frequency by the method given below:
15-20 3 3
20-25 10 10 − 3 = 7
25-30 25 25 − 10 = 15
30-35 46 46 − 25 = 21
35-40 58 58 − 46 = 12
40-45 64 64 − 58 = 6
77
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
Example 4:
2000-4000 8 8 5.7
4000-6000 15 23 16.4
6000-8000 27 50 35.7
8000-10000 44 94 67.1
10000-12000 31 125 89.3
12000-14000 12 137 97.9
14000-20000 3 140 100.0
Total 140
78
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
The data so classified on the basis of two variables give rise to the so called
bivariate frequency distribution and it can be summarized in the form of a table
is called bivariate (two-way) frequency table. While preparing a bivariate
frequency distribution, the values of each variable are grouped into various
classes (not necessarily the same for each variable). If the data corresponding to
one variable, say X is grouped into m classes and the data corresponding to the
other variable, say Y is grouped into n classes then the bivariate table will consist
of mxn cells. By going through the different pairs of the values, (X, Y) of the
variables and using tally marks we can find the frequency of each cell and thus,
obtain the bivariate frequency table. The format of a bivariate frequency table is
given below:
Marginal
x-series Class-Intervals Frequency
Mid-values of Y
y-series
fy
Marginal Total
frequency of X fx Ȉ fx = Ȉ fy= N
Here, f(x,y) is the frequency of the pair (x,y). The frequency distribution of the
values of the variables x together with their frequency total (fx) is called the
marginal distribution of x and the frequency distribution of the values of the
79
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
variable Y together with the total frequencies is known as the marginal frequency
distribution of Y. The total of the values of manual frequencies is called grand
total (N)
Example 5:
The data given below relate to the height and weight of 20 persons. Construct a
bivariate frequency table with class interval of height as 62-64, 64-66… and
weight as 115-125,125-135, write down the marginal distribution of X and Y.
Height (x) 6 7
Weight (y) 62-64 64-66 66-68 68-70 70-72 Total
115-125 II (2) II (2) 4
155-165 I (1) 1
165-175 I (1) 1
80
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
Total 3 5 4 4 4 20
The marginal distribution of height and weight are given in the following table.
CI Frequency CI Frequency
62-64 3 115-125 4
64-66 5 125-135 5
66-68 4 135-145 6
68-70 4 145-155 3
70-72 4 155-165 1
Total 20 165-175 1
Total 20
CHAPTER FIVE
Moreover, even a layman who has nothing to do with numbers can also
understands diagrams. Evidence of this can be found in newspapers, magazines,
journals, advertisement, etc. An attempt is made in this chapter to illustrate
some of the major types of diagrams and graphs frequently used in presenting
statistical data.
81
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
5.2 Diagrams
A diagram is a visual form for presentation of statistical data, highlighting their
basic facts and relationship. If we draw diagrams on the basis of the data
collected they will easily be understood and appreciated by all. It is readily
intelligible and save a considerable amount of time and energy.
Diagrams and graphs are extremely useful because of the following reasons.
82
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
1. One-dimensional diagrams
2. Two-dimensional diagrams
3. Three-dimensional diagrams
4. Pictograms and Cartograms
5.5.1 One-dimensional diagrams
In such diagrams, only one-dimensional measurement, i.e height is used and the
width is not considered. These diagrams are in the form of bar or line charts and
can be classified as
1. Line Diagram
2. Simple Diagram
3. Multiple Bar Diagram
4. Sub-divided Bar Diagram
5. Percentage Bar Diagram
Line Diagram
Line diagram is used in case where there are many items to be shown and there
is not much of difference in their values. Such diagram is prepared by drawing a
vertical line for each item according to the scale. The distance between lines is
kept uniform.
Example 1:
Show the following data by a line chart:
No. of children 0 1 2 3 4 5
Frequency 10 14 9 6 4 2
83
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
Line Diagram
16
14
12
10
8
6
4
2
0
0 1 2 3 4 5 6
No. of Children
To make the diagram attractive, the bars can be coloured. Bar diagram are used
in business and economics. However, an important limitation of such diagrams is
that they can present only one classification or one category of data. For
example, while presenting the population for the last five decades, one can only
depict the total population in the simple bar diagrams, and not its sex-wise
distribution.
Example 2:
Represent the following data by a bar diagram.
Production
Year (in tones)
1991 45
1992 40
1993 42
1994 55
1995 50
84
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
Solution:
Simple Bar Diagram
60
50
40
30
20
10
0
1991 1992 1993 1994 1995
Year
Example 3:
Draw a multiple bar diagram for the following data.
85
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
200
180
160
140
120
100
80
60
40
20
0
1998 1999 2000 2001
Year
The main defect of such a diagram is that all the parts do not have a common
base to enable one to compare accurately the various components of the data.
Example 4:
Represent the following data by a sub-divided bar diagram.
Monthly expenditure
Expenditure items (in Rs.)
Family A Family B
Food 75 95
Clothing 20 25
Education 15 10
Housing Rent 40 65
Miscellaneous 25 35
86
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
Solution:
Sub-divided Bar Diagram
240
220
200
180
160
140
120
100
80
60
40
20
0
Family A Family B
Expenditure item
Example 5:
Represent the following data by a percentage bar diagram.
87
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
Rectangles:
Rectangles are used to represent the relative magnitude of two or more values.
The area of the rectangles is kept in proportion to the values. Rectangles are
placed side by side for comparison. When two sets of figures are to be
represented by rectangles, either of the two methods may be adopted.
88
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
We may represent the figures as they are given or may convert them to
percentages and then subdivide the length into various components. Thus the
percentage sub-divided rectangular diagram is more popular than sub-divided
rectangular since it enables comparison to be made on a percentage basis.
Example 6:
Represent the following data by sub-divided percentage rectangular diagram.
Family A Family B
Items of
(Income (income
Expenditure
Rs.5000) Rs.8000)
Food 2000 2500
Clothing 1000 2000
House Rent 800 1000
Fuel and lighting 400 500
Miscellaneous 800 2000
Total 5000 8000
Solution:
The items of expenditure will be converted into percentage as shown below:
Family A Family B
Items of Expenditure
Rs. Y Rs. Y
Food 2000 40 2500 31
Clothing 1000 20 2000 25
House Rent 800 16 1000 13
Fuel and Lighting 400 8 500 6
Miscellaneous 800 16 2000 25
Total 5000 100 8000 100
SUBDIVIDED PERCENTAGE RECTANGULAR DIAGRAM
89
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
120
100
80
60
40
20
0
Family A (0-5000) Family B (0-8000)
Squares:
The rectangular method of diagrammatic presentation is difficult to use where
the values of items vary widely. The method of drawing a square diagram is very
simple. One has to take the square root of the values of various item that are to
be shown in the diagrams and then select a suitable scale to draw the squares.
Example 7:
Yield of rice in Kgs. per acre of five countries are
90
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
4 cm 2.5 3 cm 3.5 cm
2cm cm
The second step is to draw a circle of appropriate size with a compass. The size
of the radius depends upon the available space and other factors of presentation.
The third step is to measure points on the circle and representing the size of each
sector with the help of a protractor.
Example 8:
Draw a Pie diagram for the following data of production of sugar in quintals of
various countries.
Production of
Country Sugar (in
quintals)
Cuba 62
Australia 47
India 35
Japan 16
Egypt 6
Solution:
The values are expressed in terms of degree as follows.
91
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
Production of
Sugar
In In
Country Quintals Degrees
Cuba 62 134
Australia 47 102
India 35 76
Pie Diagram
Japan 16 35
Egypt 6 13
Total 166 360
Cuba
Australia
India
Japan
Egypt
92
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
Cubes of figures can be ascertained with the help of logarithms. The logarithm of
the figures can be divided by 3 and the antilog of that value will be the cube-root.
Example 9:
Represent the following data by volume diagram.
Category Number of
Students
Under graduate 64000
Post graduate 27000
Professionals 8000
Solution:
The sides of cubes can be determined as follows
Number Side of
Cube
Category of cube
root
students
Undergraduate 64000 40 4 cm
Postgraduate 27000 30 3 cm
Professional 8000 20 2 cm
93
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
5.6 Graphs:
A graph is a visual form of presentation of statistical data. A graph is more
attractive than a table of figure. Even a common man can understand the
message of data from the graph. Comparisons can be made between two or
more phenomena very easily with the help of a graph.
However here we shall discuss only some important types of graphs which
are more popular and they are
5.6.1 Histogram:
A histogram is a bar chart or graph showing the frequency of occurrence of each
value of the variable being analysed. In histogram, data are plotted as a series of
rectangles. Class intervals are shown on the ‘X-axis’ and the frequencies on the
‘Y-axis’.
The height of each rectangle represents the frequency of the class interval.
Each rectangle is formed with the other so as to give a continuous picture. Such
a graph is also called staircase or block diagram.
Example 10:
Draw a histogram for the following data.
50-100 16
100-150 27
150-200 19
200-250 10
250-300 6
Solution:
HISTOGRAM
30
25
20
15
10
0
50 100 150 200 250
Daily Wages (in Rs.)
Example 11:
For the following data, draw a histogram.
Number of
Marks
Students
21-30 6
31-40 15
41-50 22
51-60 31
61-70 17
71-80 9
Solution:
For drawing a histogram, the frequency distribution should be continuous. If it is
not continuous, then first make it continuous as follows.
Number of
Marks
Students
20.5-30.5 6
30.5-40.5 15
95
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
40.5-50.5 22
50.5-60.5 31
60.5-70.5 17
70.5-80.5 9
HISTOGRAM
35
30
25
20
15
10
0
20.5 30.5 40.5 50.5 60.5 70.5 80.5
Marks
Example 12:
Draw a histogram for the following data.
Profits Number of
(in Companies
lakhs)
0-10 4
10-20 12
20-30 24
30-50 32
50-80 18
80-90 9
90-100 3
Solution:
When the class intervals are unequal, a correction for unequal class intervals
must be made. The frequencies are adjusted as follows: The frequency of the
96
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
class 30-50 shall be divided by two since the class interval is in double. Similarly,
the class interval 5080 can be divided by 3. Then draw the histogram.
Profits Number of
(in Companies
lakhs)
0-10 4
10-20 12
20-30 24
30-40 16
40-50 16
50-60 6
60-70 6
70-80 6
80-90 9
90-100 3
HISTOGRAM
30
25
20
15
10
0
10 20 30 40 50 60 70 80 90 100
Profits (in Lakhs)
Example 13:
Draw a frequency polygon for the following data.
20
18
16
14
12
10
0
30 35 40 45 50 55 60 65
Example 14:
Draw a frequency curve for the following data.
98
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
0-1000 21
1000-2000 35
2000-3000 56
3000-4000 74
4000-5000 63
5000-6000 40
6000-7000 29
7000-8000 14
Solution:
80
70
60
50
40
30
20
10
FREQUENCY CURVE
1000 2000 3000 4000 5000 6000 7000 Monthly Wages (in Rs.) 8000
Monthly Wages in Rs.
5.6.4 Ogives
For a set of observations, we know how to construct a frequency distribution. In
some cases, we may require the number of observations less than a given value
or more than a given value. This is obtained by an accumulating (adding) the
frequencies up to (or above) the give value. This accumulated frequency is called
cumulative frequency.
99
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
In less than ogive method we start with the upper limits of the classes and go
adding the frequencies. When these frequencies are plotted, we get a rising
curve. In more than ogive method, we start with the lower limits of the classes
and from the total frequencies we subtract the frequency of each class. When
these frequencies are plotted we get a declining curve.
Example 15:
Draw the Ogives for the following data.
Class Frequency
interval
20-30 4
30-40 6
40-50 13
50-60 25
60-70 32
70-80 19
80-90 8
90-100 3
Solution:
Class Less More
limit than than
ogive ogive
20 0 110
30 4 106
40 10 100
50 23 87
60 48 62
70 80 30
80 99 11
90 107 3
100
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
100 110 0
Ogives
x axis 1cm = 10 units
120
Y y axis 1 cm = 10 units
110
100
90
80
70
60
50
40
30
20
10
0
20 30 40 50 60 70 80 90 100 X
Class limit
The curve starts from the origin (0,0) and ends at (100,100). If the wealth,
revenue, land etc are equally distributed among the people of the country, then
the Lorenz curve will be the diagonal of the square. But this is highly impossible.
The deviation of the Lorenz curve from the diagonal, shows how the wealth,
revenue, land etc are not equally distributed among people.
101
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
Example 16:
In the following table, profit earned is given from the number of companies
belonging to two areas A and B. Draw in the same diagram their Lorenz curves
and interpret them.
Number of
Profit Companies
earned (in Area
Area B
thousands) A
5 7 13
26 12 25
65 14 43
89 28 57
110 33 45
155 25 28
180 18 13
200 8 6
Solution:
Profits Area A Area B
5 5 1 7 7 5 13 13 6
26 31 4 12 19 13 25 38 17
65 96 12 14 33 23 43 81 35
89 185 22 28 61 42 57 138 60
110 295 36 33 94 65 45 183 80
155 450 54 25 119 82 28 211 92
180 630 76 18 137 94 13 224 97
200 830 100 8 145 100 6 230 100
102
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
LORENZ-CURVE
100
90
80
70
60
Line of Equal Distribution
50
Area-A
40 Area-B
30
20
10
0
0 20 40 60 80 100
103