Computer and Statistics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 103

2, Tokunbo Alli Street, Ikeja, Lagos

Website: www.jptsonline.org

E-mail: inquiries@jptsonline.org, chrisafety@jptsng.org

Tel: 01- 3427217, 08132733378

COMPUTER AND STATISTICS I


JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

TABLE OF CONTENTS
CHAPTER ONE: INTRODUCTION TO COMPUTER
CHAPTER TWO: CLASSIFICATION OF COMPUTER
CHAPTER THREE: APPLICATION OF COMPUTER
CHAPTER FOUR: ADVANTAGES OF COMPUTER
CHAPTER FIVE: LMITATIONS OF COMPUTER
CHAPTER SIX: COMPONENTS OF A COMPUTER SYSTEM
CHAPTER SEVEN: UNITS OF MEASUREMENT

2
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

CHAPTER ONE
INTRODUCTION TO COMPUTER
Today, almost all of us in the world make use of computers in one way or the
other. It finds applications in various fields of engineering, medicine, commercial,
research and others. Not only in these sophisticated areas, but also in our daily
lives, computers have become indispensable. They are present everywhere, in all
the devices that we use daily like cars, games, washing machines, microwaves etc.
and in day to day computations like banking, reservations, electronic mails,
internet and many more.

The word computer is derived from the word compute. Compute means to
calculate. The computer was originally defined as a superfast calculator. It had the
capacity to solve complex arithmetic and scientific problems at very high speed.
But nowadays, in addition to handling complex arithmetic computations,
computers perform many other tasks like accepting, sorting, selecting, moving,
comparing various types of information. They also perform arithmetic and logical
operations on alphabetic, numeric and other types of information. This
information provided by the user to the computer is data. The information in one
form which is presented to the computer is the input information or input data.

Information in another form is presented by the computer after performing a


process on it. This information is the output information or output data.

The set of instructions given to the computer to perform various operations is


known as the computer program. The process of converting the input data into
the required output form with the help of the computer program is known as
data processing. The computers are therefore also referred to as data processors.

Therefore, a computer can now be defined as a fast and accurate data processing
system that accepts data, performs various operations on the data, has the
capability to store the data and produce the results on the basis of detailed step
by step instructions given to it.

The terms hardware and software are almost always used in connection with the
computer.

3
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

• The Hardware:
The hardware is the machinery itself. It is made up of the physical parts or devices
of the computer system like the electronic Integrated Circuits (ICs), magnetic
storage media and other mechanical devices like input devices, output devices
etc. All these various hardware are linked together to form an effective functional
unit. The various types of hardware used in the computers, has evolved from
vacuum tubes of the first generation to Ultra Large Scale Integrated Circuits of the
present generation.

Computer hardware is the collection of physical elements that constitutes a


computer system. Computer hardware refers to the physical parts or components
of a computer such as the monitor, mouse, keyboard, computer data storage,
hard drive disk (HDD), system unit (graphic cards, sound cards, memory,
motherboard and chips), etc. all of which are physical objects that can be
touched.

• The Software:
The computer hardware itself is not capable of doing anything on its own; it has
to be given explicit instructions to perform the specific task. The computer
program is the one which controls the processing activities of the computer. The
computer thus functions according to the instructions written in the program.
4
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

Software mainly consists of these computer programs, procedures and other


documentation used in the operation of a computer system. Software is a
collection of programs which utilize and enhance the capability of the hardware.

Software is a generic term for organized collections of computer data and


instructions, often broken into two major categories: system software that
provides the basic non-task-specific functions of the computer, and application
software which is used by users to accomplish specific tasks.

Software Types

A. System software is responsible for controlling, integrating, and managing


the individual hardware components of a computer system so that other software
and the users of the system see it as a functional unit without having to be
concerned with the low-level details such as transferring data from memory to
disk, or rendering text onto a display. Generally, system software consists of an
operating system and some fundamental utilities such as disk formatters, file
managers, display managers, text editors, user authentication (login) and
management tools, and networking and device control software.

B. Application software is used to accomplish specific tasks other than just


running the computer system. Application software may consist of a single
program, such as an image viewer; a small collection of programs (often called a
software package) that work closely together to accomplish a task, such as a
spreadsheet or text processing system; a larger collection (often called a software
suite) of related but independent programs and packages that have a common
user interface or shared data format, such as Microsoft Office, which consists of
closely integrated word processor, spreadsheet, database, etc.; or a software
system, such as a database management system, which is a collection of
fundamental programs that may provide some service to a variety of other
independent applications.

5
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

Comparison Application Software and System Software


System Software Application Software
Computer software, or just Application software, also
software is a general term known as an application or an
primarily used for digitally "app", is computer software
stored data such as computer designed to help the user to
programs and other kinds of perform specific tasks.
information read and written by
computers. App comes under
computer software though it has
a wide scope now.
Example: 1) Microsoft Windows 1) Opera (Web Browser)
2) Linux 2) Microsoft Word (Word
3) Unix Processing)
4) Mac OSX 3) Microsoft Excel
5) DOS (Spreadsheet software)
4) MySQL (Database Software)
5) Microsoft PowerPoint
(Presentation Software)
6) Adobe Photoshop (Graphics
Software)
Interaction: Generally, users do not interact Users always interact with
with system software as it works application software while
in the background. doing different activities.
System software can run Application software cannot
Dependency: independently of the application run without the presence of
software. the system software.

The computers of today are vastly different in appearance and performance as


compared to the computers of earlier days. But where did this technology come
from and where is it heading? To fully understand the impact of computers on
today’s world and the promises they hold for the future, it is important to
understand the evolution of computers.

The First Generation:

The first generation computers made use of:


6
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

 Vacuum tube technology,


 Punched cards for data input,
 Punched cards and paper tape for output,
 Machine Language for writing programs,
 Magnetic tapes and drums for external storage.

The computers of the first generation were very bulky and emitted large amount
of heat which required air conditioning. They were large in size and cumbersome
to handle. They had to be manually assembled and had limited commercial use.
The concept of operating systems was not known at that time. Each computer
had a different binary coded program called a machine language that told it how
to operate.

The Abacus, which emerged about 5000 years ago in Asia Minor and is still in use
today, allows users to make computations using a system of sliding beads
arranged on a rack. Early merchants used Abacus to keep trading transactions.

Blaise Pascal, a French mathematician invented the first mechanical machine, a


rectangular brass box, called Pascaline which could perform addition and
subtraction on whole numbers. This was in the seventeenth century. Colmar, a
Frenchman invented a machine that could perform the four basic arithmetic
functions of addition, subtraction, multiplication and division. Colmar’s
mechanical calculator, “Arithmometer”, presented a more practical approach to
computing. With its enhanced versatility, the “Arithmometer” was widely used
until the First World War, although later inventors refined Colmar’s calculator,
together with fellow inventors, Pascal and Leibniz, he helped define the age of
mechanical computation.

Charles Babbage a British mathematician at Cambridge University invented the


first analytical engine or difference engine. This machine could be programmed
by instructions coded on punch cards and had mechanical memory to store the
results. For his contributions in this field Charles Babbage is known as ‘the father
of modern digital computer.

Some of the early computers included:


Mark I –
This was the first fully automatic calculating machine. It was designed by Howard
Aiken of Harvard University in collaboration with IBM. This machine was an

7
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

electronic relay computer. Electromagnetic signals were used for the movement
of mechanical parts. Mark I could perform the basic arithmetic and complex
equations. Although this machine was extremely reliable, it was very slow (it took
about 3-5 seconds per calculation) and was complex in design and large in size.

Atanasoff-Berry Computer (ABC) –


This computer developed by John Atanasoff and Clifford Berry was the world’s
first general purpose electronic digital computer. It made use of vacuum tubes for
internal logic and capacitors for storage.

ENIAC (Electronic Numeric Integrator and Calculator) –


The first of all electronic computer was produced by a partnership between the
US Government and the University of Pennsylvania. It was built using 18,000
vacuum tubes, 70,000 resistors and 1,500 relays and consumed 160 kilowatts of
electrical power. The ENIAC computed at speed about thousand times faster than
Mark I. However, it could store and manipulate only a limited amount of data.
Program modifications and detecting errors were also difficult.

EDVAC –
In the mid 1940’s Dr. John von Neumann designed the Electronic Discrete
Variable Automatic Computer with a memory to store both program and data.
This was the first machine which used the stored program concept. It had five
distinct units - arithmetic, central control, memory, input and output. The key
element was the central control. All the functions of the computer were co-
ordinate through this single source, the central control. The programming of the
computers was done in machine language

UNIVAC –
Remington Rand designed this computer specifically for business data processing
applications. The Universal Automatic Computer was the first general purpose
commercially available computer.

The Second Generation:

In the second generation computers:


 Vacuum tube technology was replaced by transistorized technology,
 Size of the computers started reducing,
 Assembly language started being used in place of machine language,

8
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

 Concept of stored program emerged,


 High level languages were invented.

This was the generation of Transistorized Computers. Vacuum tubes were


replaced by transistors. As a result, the size of the machines started shrinking.
These computers were smaller, faster, more reliable and more energy efficient.
The first transistorized computer was TX-0. The first large scale machines that
took advantage of the transistor technology were the early supercomputers,
Stretch by IBM and LARC by Sperry Rand. These machines were mainly developed
for atomic energy laboratories. Typical computers of the second generation were
the IBM 1400 and 7000 series, Honeywell 200 and General Electric.

IBM 1401 was universally accepted throughout the industry and most large
businesses routinely processed financial information using second generation
computers. The machine language was replaced by assembly language. Thus the
long and difficult binary code was replaced with abbreviated programming code
which was relatively easy to understand.
The stored program concept and programming languages gave the computers
flexi bility to finally be cost effective and productive for business use. The stored
program concept implied that the instructions to run a computer for a specific
task were held inside the computer’s memory and could quickly be modified or
replaced by a different set of instructions for a different function. High level
languages like COBOL, FORTRAN and AL- GOL were dev eloped. Computers
started finding vast and varied applications. The entire software industry began
with the second generation computers.

The Third Generation:


The third generation computers were characterized by:
 Use of Integrated circuits,
 Phenomenal increase in computation speed,
 Substantial reduction in size and power consumption of the machines,
 Use of magnetic tapes and drums for external storage,
 Design of operating systems and new higher level languages,
 Commercial production of computers.

This generation was characterized by the invention of Integrated Circuits (ICs).


The 1C combined electronic components onto a small chip which was made from
quartz. Semi-conductor: This reduced the size even further. The weight and

9
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

power consumption of computers decreased and the speed increased


tremendously. Heavy emphasis was given to the development of software.
Operating systems were designed which allowed the machine to run many
different programs at once. A central program monitored and co-ordinate the
computer s memory. Multiprogramming was made possible, whereby the
machine could perform several jobs at the same time. Computers achieved
speeds of executing millions of instructions per second. Commercial production
became easier and cheaper. Higher level languages like Pascal and Report
Program Generator (RPG) were introduced and applications oriented languages
like FORTRAN, COBOL, and PL/1 were developed.

The Fourth Generation:

The general features of the fourth generation computers were:


 Use of very large scale integration,
 Invention of microcomputers,
 Introduction of Personal Computers,
 Networking,
 Fourth Generation Languages.

The third generation computers made use of ‘Integrated Circuits that had 10-20
components on each chip, this was Small Scale Integration (SSI).

The Fourth Generation realized Large Scale Integration (LSI) which could fit
hundreds of components on one chip and Very Large Scale integration (VLSI)
which squeezed thousands of components on one chip. The Intel 4004 chip,
located all the components of a computer (central processing unit, memory, input
and output controls) on a single chip and microcomputers were introduced.
Higher capacity storage media like magnetic disks were developed. Fourth
generation languages emerged and applications softwares started becoming
popular.

Computer production became inexpensive and the era of Personal Computers


(PCs) commenced. In 1981, IBM introduced its personal computer for use in
office, home and schools. In direct competition, the Macintosh was introduced by
Apple in 1984. Shared interactive systems and user friendly environments were
the features of these computers.

10
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

As the computers started becoming more and more powerful, they could be
linked together or networked to share not only data but also memory space and
software. The networks could reach enormous proportions with local area
networks. A global web of computer circuitry, the Internet, links the computers
worldwide into a single network of information.

The Fifth Generation:

Defining the fifth generation computers is somewhat difficult because the field is
still in its infancy. The computers of tomorrow would be characterized by Artificial
Intelligence (AI). An example of Al is Expert Systems. Computers could be
developed which could think and reason in much the same way as humans.
Computers would be able to accept spoken words as input (voice recognition).

Many advances in the science of computer design and technology are coming
together to enable the creation of fifth generation computers. Two such advances
are parallel processing where many CPUs work as one and advance in
superconductor technology which allows the flow of electricity with little or no
resistance, greatly improving the speed of information flow.

11
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

CHAPTER TWO
CLASSIFICATION OF COMPUTERS
Computers are broadly classified into two categories depending upon the logic
used in their design as:

 Analog computers:
In analog computers, data is recognized as a continuous measurement of a
physical property like voltage, speed, pressure etc. Readings on a dial or graphs
are obtained as the output, ex. Voltage, temperature; pressure can be measured
in this way.

 Digital Computers:
These are high speed electronic devices. These devices are programmable. They
process data by way of mathematical calculations, comparison, sorting etc. They
accept input and produce output as discrete signals representing high (on) or low
(off) voltage state of electricity. Numbers, alphabets, symbols are all represented
as a series of 1s and Os.

Digital Computers are further classified as General Purpose Digital Computers and
Special Purpose Digital Computers. General Purpose computers can be used for
any applications like accounts, payroll, data processing etc. Special purpose
computers are used for a specific job like those used in automobiles, microwaves
etc.
Another classification of digital computers is done on the basis of their capacity to
access memory and size like:

• Small Computers:
I) Microcomputers: Microcomputers are generally referred to as Personal
Computers (PCs). They have smallest memory and less power. They are
widely used in day to day applications like office automation, and
professional applications, ex. PCAT, Pentium etc.
II) Note Book and Laptop Computers: These are portable in nature and are
battery operated. Storage devices like CDs, floppies etc. and output devices
like printers can be connected to these computers. Notebook computers

12
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

are smaller in physical size than laptop computers. However, both have
powerful processors, support graphics, and can accept mouse driven input.
III) Hand Held Computers
These types of computers are mainly used in applications like collection of
field data. They are even smaller than the note book computers.

• Hybrid Computers: Hybrid Computers are a combination of Analog and


Digital computers. They combine the speed of analog computers and
accuracy of digital computers. They are mostly used in specialized
applications where the input data is in an analog form i.e. measurement.
This is converted into digital form for further processing. The computers
accept data from sensors and produce output using conventional
input/output devices.
• Mini Computers: Mini computers are more powerful than the micro-
computers. They have higher memory capacity and more storage capacity
with higher speeds. These computers are mainly used in process control
systems. They are mainly used in applications like payrolls, financial
accounting, Computer aided design etc. ex. VAX, PDP-11
• Mainframe Computers: Main frame computers are very large computers
which process data at very high speeds of the order of several million
instructions per second. They can be linked into a network with smaller
computers, micro-computers and with each other. They are typically used
in large organizations, government departments etc. ex. IBM4381, CDC
• Super Computers: A super computer is the fastest, most powerful and most
expensive computer which is used for complex tasks that require a lot of
computational power. Super computers have multiple processors which
process multiple instructions at the same time. This is known as parallel
processing. These computers are widely used in very advanced applications
like weather forecasting, processing geological data etc. ex. CRAY-2, NEC -
500, PARAM.

13
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

CHAPTER THREE
APPLICATIONS OF COMPUTERS
Today computers find widespread applications in all activities of the modern
world. Some of the major application areas include:

Scientific, Engineering and Research


This is the major area where computers find vast applications. They are used in
areas which require lot of experiments, mathematical calculations, weather
forecasting, and complex mathematical and engineering applications. Computer
Aided Design (CAD) and Computer Aided Manufacturing (CAM) help in designing
robotics, automobile manufacturing, automatic process control dev ices etc.

Business:
Record keeping, budgets, reports, inventory, payroll, invoicing, accounts are all
the areas of business and industry where computers are used to a great extent.
Database management is one of the major area where computers are used on a
large scale. The areas of application here include banking, airline reservations,
etc. where large amounts of data need to be updated, edited, sorted, searched
from large databases.

Medicine:
Computerized systems are now in widespread use in monitoring patient data like
pulse rate, blood pressure etc. resulting in faster and accurate diagnosis. Modern
day medical equipment are highly computerized today. Computers are also widely
used in medical research.

Information:
This is the age of information. Television, Satellite communication, Internet,
networks are all based on computers.

Education:
The use of computers in education is increasing day by day. The students develop
the habit of thinking more logically and are able to formulate problem solving
techniques. CDs on a variety of subjects are available to impart education. On line
training programs for students are also becoming popular day by day. All the
major encyclopedias, dictionaries and books are now available in the digital form

14
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

and therefore are easily accessible to the student of today. Creativity in drawing,
painting, designing, decoration, music etc. can be well developed with computers.

Games and Entertainment:


Computer games are popular with children and adults alike. Computers are
nowadays also used in entertainment areas like movies, sports, advertising etc.

15
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

CHAPTER FOUR
ADVANTAGES OF COMPUTERS
Speed:
The speed of a computer is measured in terms of the number of instructions that
it can perform or execute in a second. The speeds of computers are measured in
milliseconds (10~3 sec), micro-seconds (10*6 sec), and nano-seconds (10~9sec).
Computers are superfast machines and can process millions of instructions per
second. Smaller computers can execute thousands of instructions per second,
while the more complex machines can execute millions of instructions per
second.

Accuracy:
Computers are very accurate. They are capable of executing hundreds of
instructions without any errors. They do not make mistakes in their computations.
They perform each and every calculation with the same accuracy.

Efficiency
The efficiency of computers does not decrease with age. The computers can
perform repeated tasks with the same efficiency any number of times without
exhausting there selves. Even if they are instructed to execute millions of
instructions, they are capable of executing them all with the same speed and
efficiency without exhaustion.

Storage Capability
Computers are capable of storing large amounts of data in their storage devices.
These dev ices occupy very less space and can store millions of characters in
condensed forms. These storage devices typically include floppy disks, tapes, hard
disks, CDs etc, the data stored on these devices can be retrieved and reused
whenever it is required in future

Versatility
Computers are very versatile. They are capable not only of performing complex
mathematical tasks of science and engineering, but also other non-numerical
operations fielding air-line reservation, electricity bills, data base management
etc.

16
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

CAHPTER FIVE
LIMITATIONS OF COMPUTERS
Although the computers of today are highly intelligent and sophisticated they
have their own limitations. The computer cannot think on its own, since it does
not have its own brain. It can only do what is has been programmed to do. It can
execute only those jobs that can be expressed as a finite set of instructions to
achieve a specific goal. Each of the steps has to be clearly defined. The computers
do not learn from previous experience nor can they arrive at a conclusion without
going through all the intermediate steps. However the impact of computers on
today’s society in phenomenal and they are today an important part of the
society.

A COMPUTER SYSTEM
Any system is defined as a group of integrated parts which are designed to
achieve a common objective. Thus, a system is made up of more than one
element or part, where each element performs a specific function and where all
the elements (parts) are logically related and are controlled in such a way that the
goal (purpose) of the system is achieved.

A computer is made up of a number of integrated elements like


- The central processing unit,
- The input and output devices and
- The storage devices.

Each of these units performs a specific task. However, none of them can function
independently on their own. They are logically related and controlled to achieve a
specific goal. When they are thus integrated they form a fully-fledged computer
system.

17
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

CHAPTER SIX
COMPONENTS OF A COMPUTER SYSTEM
The basic parts of computer system are:
 Input Unit
 The Central Processing Unit
 Output Unit

The Input Unit:


Input devices are the devices which are used to feed programs and data to the
computer. The input system connects the external environment with the
computer system. The input devices are the means of communication between
the user and the computer system. Typical input devices include the keyboard,
floppy disks, mouse, microphone, light pen, joy stick, magnetic tapes etc. The way
in which the data is fed into the computer through each of these devices is
different. However, a computer can accept data only in a specific form. Therefore
these input devices transform the data fed to them, into a form which can be
accepted by the computer. These devices are a means of communication and
inter1 station between the user and the computer systems.

Input device is any peripheral (piece of computer hardware equipment to provide


data and control signals to an information processing system such as a computer
or other information appliance. Input device translates data from form that
humans understand to one that the computer can work with. Most common are
keyboard and mouse

Thus the functions of the input unit are:


 accept information (data) and programs.
 convert the data in a form which the computer can accept.
 provide this converted data to the computer for further processing.

Examples of Input Device


Keyboard Mouse (Pointing Device) Microphone
Touch Screen Scanner Webcam
Touchpads MIDI Keyboard Graphic s Tablets
Cameras Pen Input Video Capture Hardware
Microphone Trackballs Barcode Reader

18
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

Digital Camera Joystick Gamepad


Electronic Whiteboard

The Central Processing Unit:


This is the brain of any computer system. The central processing unit or CPU is
made of three parts:
 The control unit.
 The arithmetic logic unit
 The primary storage unit

The Control Unit :


The Control Unit controls the operations of the entire computer system. The
control unit gets the instructions from the programs stored in primary storage
unit interprets these instruction and subsequently directs the other units to
execute the instructions. Thus it manages and coordinates the entire computer
system.

The Arithmetic Logic Unit:


The Arithmetic Logic Unit (ALU) actually executes the instructions and performs
all the calculations and decisions. The data is held in the primary storage unit and
transferred to the ALU whenever needed. Data can be moved from the primary
storage to the arithmetic logic unit a number of times before the entire
processing is complete. After the completion, the results are sent to the output
storage section and the output devices.

The Primary Storage Unit:


This is also called as Main Memory. Before the actual processing starts the data
and the instructions fed to the computer through the input units are stored in this
primary storage unit. Similarly, the data which is to be output from the computer
system is also temporarily stored in the primary memory. It is also the area where
intermediate results of calculations are stored. The main memory has the storage
section that holds the computer programs during execution. Thus the primary
unit:
 Stores data and programs during actual processing
 Stores temporary results of intermediate processing
 Stores results of execution temporarily

19
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

Output Unit:
The output devices give the results of the process and computations to the
outside world. The output units accept the results produced by the computer,
convert them into a human readable form and supply them to the users. The
more common output devices are printers, plotters, display screens, magnetic
tape drives etc.

An output device is any piece of computer hardware equipment used to


communicate the results of data processing carried out by an information
processing system (such as a computer) which converts the electronically
generated information into human-readable form.
Monitor LCD Projection Panel Printers (All Types)
Computer Output Microfilm (COM) Plotters Speaker(s)
Projector

20
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

CHAPTER SEVEN
UNITS OF MEASUREMENT
Unit of Measurements
Storage Measurements: The basic unit used in computer data storage is called a
bit (binary digit). Computers use these little bits, which are composed of ones and
zeros, to do things and talk to other computers. All your files, for instance, are
kept in the computer as binary files and translated into words and pictures by the
software (which is also ones and zeros). This ‘two-number’ system is called a
“binary number system” since it has only two numbers in it. The decimal number
system in contrast has ten unique digits, zero through nine.

Computer Storage Units

Bit BIT 0 or 1
Kilobyte KB 1024 Byte
Megabyte MB 1024 Kilobyte
Gigabyte GB 1024 Megabyte
Terabyte TB 1024 Gigabyte

Size example
• 1 bit - answer to an yes/no question
• 1 byte - a number from 0 to 255.
• 90 bytes: enough to store a typical line of text from a book.
• 4 KB: about one page of text.
• 120 KB: the text of a typical pocket book.
• 3 MB - a three minute song (128k bitrate)
• 650-900 MB - an CD-ROM
• 1 GB -114 minutes of uncompressed CD-quality audio at 1.4 Mbit/s
• 8-16 GB - size of a normal flash drive

Speed Measurement: The speed of Central Processing Unit (CPU) is measured by


Hertz (Hz), which represents a CPU cycle. The speed of CPU is known as Computer
Speed.

21
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

CPU SPEED MEASURE


1 Hertz or Hz 1 cycle per second
1 MHz 1 million cycle per second or 1000 Hz
1 GHz 1 billion cycle per second or 1000 MHz

22
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

STATISTICS

23
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

TABLE OF CONTENTS
CHAPTER ONE: DEFINITION, SCOPE AND LIMITATIONS
CHAPTER TWO: INTRODUCTION TO SAMPLING METHODS
CHAPTER THREE: COLLECTION OF DATA: CLASSIFICATION AND TABULATION
CHAPTER FOUR: FREQUENCY DISTRIBUTION
CHAPTER FIVE: DIAGRAMMATIC AND GRAPHICAL REPRESENTATION
CHAPTER SIX: MEASURE OF CENTRAL TENDENCY
CHAPTER SEVEN: MEASURE OF DISPERSION: SKEWNESS AND KURTOSIS
CHAPTER EIGHT: CORRELATION
CHAPTER NINE: REGRESSION
CHAPTER TEN: INDEX NUMBERS

24
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

CHAPTER ONE
DEFINITIONS, SCOPE AND LIMITATIONS
1.1 Introduction
In the modern world of computers and information technology, the importance of
statistics is very well-recognized by all the disciplines. Statistics has originated as a
science of statehood and found applications slowly and steadily in Agriculture,
Economics, Commerce, Biology, Medicine, Industry, planning, education and so
on. As of date, there is no other human walk of life, where statistics cannot be
applied.

1.2. Origin and Growth of Statistics


The word ‘Statistics’ and ‘Statistical’ are all derived from the Latin word Status,
means a political state. The theory of statistics as a distinct branch of scientific
method is of comparatively recent growth. Research particularly into the
mathematical theory of statistics is rapidly proceeding and fresh discoveries are
being made all over the world.

1.3 Meaning of Statistics


Statistics is concerned with scientific methods for collecting, organizing,
summarizing, presenting and analyzing data as well as deriving valid conclusions
and making reasonable decisions on the basis of this analysis. Statistics is
concerned with the systematic collection of numerical data and its interpretation.

The word ‘statistic’ is used to refer to:

1. Numerical facts, such as the number of people living in particular area.


2. The study of ways of collecting, analyzing and interpreting the facts.

1.4 Definitions
Statistics is defined differently by different authors over a period of time. In the
olden days statistics was confined to only state affairs but in modern days it
embraces almost every sphere of human activity. Therefore, a number of old
definitions, which was confined to narrow field of enquiry were replaced by more

25
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

definitions, which are much more comprehensive and exhaustive. Secondly,


statistics has been defined in two different ways – Statistical data and statistical
methods. The following are some of the definitions of statistics as numerical data.

1. Statistics are the classified facts representing the conditions of people in a


state. In particular, they are the facts, which can be stated in numbers or in
tables of numbers or in any tabular or classified arrangement.
2. Statistics are measurements, enumerations or estimates of natural
phenomenon usually systematically arranged, analyzed and presented as to
exhibit important interrelationships among them.

1.4.1 Definitions by A.L. Bowley


Statistics are numerical statement of facts in any department of enquiry placed in
relation to each other. - A.L. Bowley

Statistics may be called the science of counting in one of the departments due to
Bowley, obviously this is an incomplete definition as it takes into account only the
aspect of collection and ignores other aspects such as analysis, presentation and
interpretation.

Bowley gives another definition for statistics, which states ‘statistics may be
rightly called the scheme of averages’. This definition is also incomplete, as
averages play an important role in understanding and comparing data and
statistics provide more measures.

1.4.2 Definition by Croxton and Cowden:


Statistics may be defined as the science of collection, presentation analysis and
interpretation of numerical data from the logical analysis. It is clear that the
definition of statistics by Croxton and Cowden is the most scientific and realistic
one.

According to this definition there are four stages:

1. Collection of Data: It is the first step and this is the foundation upon which
the entire data set. Careful planning is essential before collecting the data. There

26
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

are different methods of collection of data such as census, sampling, primary,


secondary, etc., and the investigator should make use of correct method.
2. Presentation of data: The mass data collected should be presented in a
suitable, concise form for further analysis. The collected data may be presented in
the form of tabular or diagrammatic or graphic form.
3. Analysis of data: The data presented should be carefully analyzed for
making inference from the presented data such as measures of central
tendencies, dispersion, correlation, regression etc.,
4. Interpretation of data: The final step is drawing conclusion from the data
collected. A valid conclusion must be drawn on the basis of analysis. A high
degree of skill and experience is necessary for the interpretation.

1.4.3 Definition by Horace Secrist


Statistics may be defined as the aggregate of facts affected to a marked extent by
multiplicity of causes, numerically expressed, enumerated or estimated according
to a reasonable standard of accuracy, collected in a systematic manner, for a
predetermined purpose and placed in relation to each other.

The above definition seems to be the most comprehensive and exhaustive.

1.5 Functions of Statistics


There are many functions of statistics. Let us consider the following five
important functions.

1.5.1 Condensation:
Generally speaking, by the word ‘to condense’, we mean to reduce or to lessen.
Condensation is mainly applied at embracing the understanding of a huge mass of
data by providing only few observations. If in a particular class in Chennai School,
only marks in an examination are given, no purpose will be served. Instead if we
are given the average mark in that particular examination, definitely it serves the
better purpose. Similarly, the range of marks is also another measure of the data.
Thus, Statistical measures help to reduce the complexity of the data and
consequently to understand any huge mass of data.

27
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

1.5.2 Comparison
Classification and tabulation are the two methods that are used to condense the
data. They help us to compare data collected from different sources. Grand totals,
measures of central tendency measures of dispersion, graphs and diagrams,
coefficient of correlation etc. provide ample scope for comparison.

If we have one group of data, we can compare within itself. If the rice production
(in Tonnes) in Tanjore district is known, then we can compare one region with
another region within the district. Or if the rice production (in Tonnes) of two
different districts within Tamilnadu is known, then also a comparative study can
be made. As statistics is an aggregate of facts and figures, comparison is always
possible and in fact comparison helps us to understand the data in a better way.

1.5.3 Forecasting:
By the word forecasting, we mean to predict or to estimate beforehand. Given
the data of the last ten years connected to rainfall of a particular district in
Tamilnadu, it is possible to predict or forecast the rainfall for the near future. In
business also forecasting plays a dominant role in connection with production,
sales, profits etc. The analysis of time series and regression analysis plays an
important role in forecasting.

1.5.4 Estimation:
One of the main objectives of statistics is drawn inference about a population
from the analysis for the sample drawn from that population. The four major
branches of statistical inference are

1. Estimation theory
2. Tests of Hypothesis
3. Non-Parametric tests
4. Sequential analysis

In estimation theory, we estimate the unknown value of the population


parameter based on the sample observations. Suppose we are given a sample of
heights of hundred students in a school, based upon the heights of these 100

28
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

students, it is possible to estimate the average height of all students in that


school.

1.5.5 Tests of Hypothesis:


A statistical hypothesis is some statement about the probability distribution,
characterizing a population on the basis of the information available from the
sample observations. In the formulation and testing of hypothesis, statistical
methods are extremely useful. Whether crop yield has increased because of the
use of new fertilizer or whether the new medicine is effective in eliminating a
particular disease are some examples of statements of hypothesis and these are
tested by proper statistical tools.

1.6 Scope of Statistics:


Statistics is not a mere device for collecting numerical data, but as a means of
developing sound techniques for their handling, analysing and drawing valid
inferences from them. Statistics is applied in every sphere of human activity –
social as well as physical – like Biology, Commerce, Education, Planning, Business
Management, Information Technology, etc. It is almost impossible to find a single
department of human activity where statistics cannot be applied. We now discuss
briefly the applications of statistics in other disciplines.

1.6.1 Statistics and Industry:


Statistics is widely used in many industries. In industries, control charts are widely
used to maintain a certain quality level. In production engineering, to find
whether the product is conforming to specifications or not, statistical tools,
namely inspection plans, control charts, etc., are of extreme importance. In
inspection plans we have to resort to some kind of sampling – a very important
aspect of Statistics.

1.6.2 Statistics and Commerce:


Statistics are lifeblood of successful commerce. Any businessman cannot afford to
either by under stocking or having overstock of his goods. In the beginning, he
estimates the demand for his goods and then takes steps to adjust with his output
or purchases. Thus, statistics is indispensable in business and commerce.

29
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

As so many multinational companies have invaded into our Indian economy, the
size and volume of business is increasing. On one side the stiff competition is
increasing whereas on the other side the tastes are changing and new fashions
are emerging. In this connection, market survey plays an important role to exhibit
the present conditions and to forecast the likely changes in future.

1.6.3 Statistics and Agriculture:


Analysis of variance (ANOVA) is one of the statistical tools developed by Professor
R.A. Fisher, plays a prominent role in agriculture experiments. In tests of
significance based on small samples, it can be shown that statistics is adequate to
test the significant difference between two sample means. In analysis of variance,
we are concerned with the testing of equality of several population means.

For an example, five fertilizers are applied to five plots each of wheat and the
yields of wheat on each of the plots are given. In such a situation, we are
interested in finding out whether the effect of these fertilizers on the yield is
significantly different or not. In other words, whether the samples are drawn from
the same normal population or not. The answer to this problem is provided by the
technique of ANOVA and it is used to test the homogeneity of several population
means.

1.6.4 Statistics and Economics:


Statistical methods are useful in measuring numerical changes in complex groups
and interpreting collective phenomenon. Nowadays the uses of statistics are
abundantly made in any economic study. Both in economic theory and practice,
statistical methods play an important role.

Alfred Marshall said, “Statistics are the straw only which I like every other
economist have to make the bricks”. It may also be noted that statistical data and
techniques of statistical tools are immensely useful in solving many economic
problems such as wages, prices, production, distribution of income and wealth
and so on. Statistical tools like Index numbers, time series Analysis, Estimation
theory, Testing Statistical Hypothesis are extensively used in economics.

30
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

1.6.5 Statistics and Education:


Statistics is widely used in education. Research has become a common feature in
all branches of activities. Statistics is necessary for the formulation of policies to
start new course, consideration of facilities available for new courses etc. There
are many people engaged in research work to test the past knowledge and evolve
new knowledge. These are possible only through statistics.

1.6.6 Statistics and Planning:


Statistics is indispensable in planning. In the modern world, which can be termed
as the “world of planning”, almost all the organizations in the government are
seeking the help of planning for efficient working, for the formulation of policy
decisions and execution of the same.

In order to achieve the above goals, the statistical data relating to production,
consumption, demand, supply, prices, investments, income expenditure etc and
various advanced statistical techniques for processing, analysing and interpreting
such complex data are of importance. In India statistics, play an important role in
planning, commissioning both at the central and state government levels.

1.6.7 Statistics and Medicine:


In Medical sciences, statistical tools are widely used. In order to test the efficiency
of a new drug or medicine, t - test is used or to compare the efficiency of two
drugs or two medicines, ttest for the two samples is used. More and more
applications of statistics are at present used in clinical investigation.

1.6.8 Statistics and Modern applications:


Recent developments in the fields of computer technology and information
technology have enabled statistics to integrate their models and thus make
statistics a part of decision making procedures of many organizations. There are
so many software packages available for solving design of experiments,
forecasting simulation problems etc.

SYSTAT, a software package offers mere scientific and technical graphing options
than any other desktop statistics package. SYSTAT supports all types of scientific
and technical research in various diversified fields as follows

31
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

1. Archeology: Evolution of skull dimensions


2. Epidemiology: Tuberculosis
3. Statistics: Theoretical distributions
4. Manufacturing: Quality improvement
5. Medical research: Clinical investigations.
6. Geology: Estimation of Uranium reserves from ground water

1.7 Limitations of statistics:


Statistics with all its wide application in every sphere of human activity has its
own limitations. Some of them are given below.

1. Statistics is not suitable to the study of qualitative phenomenon: Since


statistics is basically a science and deals with a set of numerical data, it is
applicable to the study of only these subjects of enquiry, which can be
expressed in terms of quantitative measurements. As a matter of fact,
qualitative phenomenon like honesty, poverty, beauty, intelligence etc,
cannot be expressed numerically and any statistical analysis cannot be
directly applied on these qualitative phenomenons. Nevertheless, statistical
techniques may be applied indirectly by first reducing the qualitative
expressions to accurate quantitative terms. For example, the intelligence of a
group of students can be studied on the basis of their marks in a particular
examination.
2. Statistics does not study individuals: Statistics does not give any specific
importance to the individual items, in fact it deals with an aggregate of
objects. Individual items, when they are taken individually do not constitute
any statistical data and do not serve any purpose for any statistical enquiry.
3. Statistical laws are not exact: It is well known that mathematical and
physical sciences are exact. But statistical laws are not exact and statistical
laws are only approximations. Statistical conclusions are not universally true.
They are true only on an average.
4. Statistics table may be misused: Statistics must be used only by
experts; otherwise, statistical methods are the most dangerous tools on the
hands of the inexpert. The use of statistical tools by the inexperienced and
untraced persons might lead to wrong conclusions. Statistics can be easily
misused by quoting wrong figures of data. As King says aptly ‘statistics are
like clay of which one can make a God or Devil as one pleases’.

32
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

5. Statistics is only, one of the methods of studying a problem:


Statistical method does not provide complete solution of the problems
because problems are to be studied taking the background of the countries
culture, philosophy or religion into consideration. Thus, the statistical study
should be supplemented by other evidences.

33
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

CHAPTER TWO
INTRODUCTION TO SAMPLING METHODS
2.1 Introduction
Sampling is very often used in our daily life. For example, while purchasing food
grains from a shop we usually examine a handful from the bag to assess the
quality of the commodity. A doctor examines a few drops of blood as sample and
draws conclusion about the blood constitution of the whole body. Thus, most of
our investigations are based on samples. In this chapter, let us see the importance
of sampling and the various methods of sample selections from the population.
2.2 Population
In a statistical enquiry, all the items, which fall within the purview of enquiry, are
known as Population or Universe. In other words, the population is a complete
set of all possible observations of the type which is to be investigated. Total
numbers of students studying in a school or college, total number of books in a
library, total number of houses in a village or town are some examples of
population.

Sometimes it is possible and practical to examine every person or item in the


population we wish to describe. We call this a Complete enumeration, or census.
We use sampling when it is not possible to measure every item in the population.
Statisticians use the word population to refer not only to people but to all items
that have been chosen for study.

2.2.1 Finite population and infinite population


A population is said to be finite if it consists of finite number of units. Numbers of
workers in a factory, production of articles in a particular day for a company are
examples of finite population. The total number of units in a population is called
population size. A population is said to be infinite if it has infinite number of units.
For example, the number of stars in the sky, the number of people seeing the
Television programmes etc.

34
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

2.2.2 Census Method


Information on population can be collected in two ways – census method and
sample method. In census method, every element of the population is included in
the investigation. For example, if we study the average annual income of the
families of a particular village or area, and if there are 1000 families in that area,
we must study the income of all 1000 families. In this method, no family is left
out, as each family is a unit.

Population census of India


The population census of our country is taken at 10 yearly intervals. The latest
census was taken in 2001. The first census was taken in 1871 – 72.

[Latest population census of India is included at the end of the chapter.]

2.2.3 Merits and limitations of Census method


Merits
1. The data are collected from each and every item of the population
2. The results are more accurate and reliable, because every item of the
universe is required.
3. Intensive study is possible.
4. The data collected may be used for various surveys, analyses etc.

Limitations
1. It requires a large number of enumerators and it is a costly method
2. It requires more money, labour, time energy etc.
3. It is not possible in some circumstances where the universe is infinite.

2.3 Sampling
The theory of sampling has been developed recently but this is not new. In our
everyday life we have been using sampling theory as we have discussed in
introduction. In all those cases we believe that the samples give a correct idea
about the population. Most of our decisions are based on the examination of a
few items that is sample studies.

35
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

2.3.1 Sample
Statisticians use the word sample to describe a portion chosen from the
population. A finite subset of statistical individuals defined in a population is
called a sample. The number of units in a sample is called the sample size.

Sampling unit
The constituents of a population which are individuals to be sampled from the
population and cannot be further subdivided for the purpose of the sampling at a
time are called sampling units. For example, to know the average income per
family, the head of the family is a sampling unit. To know the average yield of rice,
each farm owner’s yield of rice is a sampling unit.

Sampling frame
For adopting any sampling procedure it is essential to have a list identifying each
sampling unit by a number. Such a list or map is called sampling frame. A list of
voters, a list of house holders, a list of villages in a district, a list of farmers etc. are
a few examples of sampling frame.
2.3.2 Reasons for selecting a sample
Sampling is inevitable in the following situations:

1. Complete enumerations are practically impossible when the population is


infinite.
2. When the results are required in a short time.
3. When the area of survey is wide.
4. When resources for survey are limited particularly in respect of money and
trained persons.
5. When the item or unit is destroyed under investigation.
2.3.3 Parameters and statistics
We can describe samples and populations by using measures such as the mean,
median, mode and standard deviation. When these terms describe the
characteristics of a population, they are called parameters. When they describe
the characteristics of a sample, they are called statistics. A parameter is a
characteristic of a population and a statistic is a characteristic of a sample. Since

36
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

samples are subsets of population statistics provide estimates of the parameters.


That is, when the parameters are unknown, they are estimated from the values of
the statistics.

In general, we use Greek or capital letters for population parameters and lower
case Roman letters to denote sample statistics. [N, µ, σ, are the standard symbols
for the size, mean, S.D, of population. n , x , s, are the standard symbol for the
size, mean, s.d of sample respectively].

2.3.4 Principles of Sampling


Samples have to provide good estimates. The following principle tell us that the
sample methods provide such good estimates

1. Principle of statistical regularity


A moderately large number of units chosen at random from a large group are
almost sure on the average to possess the characteristics of the large group.

2. Principle of Inertia of large numbers

Other things being equal, as the sample size increases, the results tend to be
more accurate and reliable.

3. Principle of Validity
This states that the sampling methods provide valid estimates about the
population units (parameters).

4. Principle of Optimization
This principle takes into account the desirability of obtaining a sampling design
which gives optimum results. This minimizes the risk or loss of the sampling
design.

The foremost purpose of sampling is to gather maximum information about the


population under consideration at minimum cost, time and human power. This is
best achieved when the sample contains all the properties of the population.

37
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

Sampling errors and non-sampling errors


The two types of errors in a sample survey are sampling errors and non - sampling
errors.

1. Sampling errors
Although a sample is a part of population, it cannot be expected generally to
supply full information about population. So, there may be in most cases
difference between statistics and parameters. The discrepancy between a
parameter and its estimate due to sampling process is known as sampling error.

2. Non-sampling errors:
In all surveys, some errors may occur during collection of actual information.
These errors are called Non-sampling errors.

2.3.5 Advantages and Limitation of Sampling:


There are many advantages of sampling methods over census method. They are
as follows:

1. Sampling saves time and labour.


2. It results in reduction of cost in terms of money and man hour.
3. Sampling ends up with greater accuracy of results.
4. It has greater scope.
5. It has greater adaptability.
6. If the population is too large, or hypothetical or
destroyable sampling is
the only method to be used. The limitations of sampling are given below:
1. Sampling is to be done by qualified and experienced persons. Otherwise,
the information will be unbelievable.
2. Sample method may give the extreme values sometimes instead of the
mixed values.
3. There is the possibility of sampling errors. Census survey is free from
sampling error.
2.4 Types of Sampling:
The technique of selecting a sample is of fundamental importance in sampling
theory and it depends upon the nature of investigation. The sampling procedures
which are commonly used may be classified as

38
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

1. Probability sampling.
2. Non-probability sampling.
3. Mixed sampling.
2.4.1Probability sampling (Random Sampling)
A probability sample is one where the selection of units from the population is
made according to known probabilities. (eg.) Simple random sample, probability
proportional to sample size etc.

2.4.2 Non-Probability Sampling


It is the one where discretion is used to select ‘representative’ units from the
population (or) to infer that a sample is ‘representative’ of the population. This
method is called judgement or purposive sampling. This method is mainly used
for opinion surveys; A common type of judgement sample used in surveys is
quota sample. This method is not used in general because of prejudice and bias of
the enumerator. However, if the enumerator is experienced and expert, this
method may yield valuable results. For example, in the market research survey of
the performance of their new car, the sample was all new car purchasers.
2.4.3 Mixed Sampling
Here samples are selected partly according to some probability and partly
according to a fixed sampling rule; they are termed as mixed samples and the
technique of selecting such samples is known as mixed sampling.
2.5Methods of selection of samples
Here we shall consider the following three methods:
1. Simple random sampling.
2. Stratified random sampling.
3. Systematic random sampling.
1. Simple random sampling
A simple random sample from finite population is a sample selected such that
each possible sample combination has equal probability of being chosen. It is also
called unrestricted random sampling.
2. Simple random sampling without replacement
In this method, the population elements can enter the sample only once (i.e.) the
unit once selected is not returned to the population before the next draw.

39
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

3. Simple random sampling with replacement:


In this method, the population units may enter the sample more than once.
Simple random sampling may be with or without replacement.

2.5.1 Methods of selection of a simple random sampling:


The following are some methods of selection of a simple random sampling.

a) Lottery Method:
This is the most popular and simplest method. In this method, all the items of the
population are numbered on separate slips of paper of same size, shape and
colour. They are folded and mixed up in a container. The required numbers of
slips are selected at random for the desire sample size. For example, if we want to
select 5 students, out of 50 students, then we must write their names or their roll
numbers of all the 50 students on slips and mix them. Then we make a random
selection of 5 students.

This method is mostly used in lottery draws. If the universe is infinite this method
is inapplicable.

b) Table of Random numbers:


As the lottery method, cannot be used, when the population is infinite, the
alternative method is that of using the table of random numbers. There are
several standard tables of random numbers.

1. Tippett’s table
2. Fisher and Yates’ table
3. Kendall and Smith’s table are the three tables among them.
A random number table is so constructed that all digits 0 to 9 appear independent
of each other with equal frequency. If we have to select a sample from population
of size N = 100, then the numbers can be combined three by three to give the
numbers from 001 to 100.

[See Appendix for the random number table]


Procedure to select a sample using random number table:
Units of the population from which a sample is required are assigned with equal
number of digits. When the size of the population is less than thousand, three
40
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

digit number 000,001,002, ….. 999 are assigned. We may start at any place and
may go on in any direction such as column wise or row- wise in a random number
table. But consecutive numbers are to be used.

On the basis of the size of the population and the random number table available
with us, we proceed according to our convenience. If any random number is
greater than the population size N, then N can be subtracted from the random
number drawn.

This can be repeatedly until the number is less than N or equal to N.


Example 1:
In an area there are 500 families. Using the following extract from a table of
random numbers select a sample of 15 families to find out the standard of living
of those families in that area.

4652 3819 8431 2150 2352 2472 0043 3488


9031 7617 1220 4129 7148 1943 4890 1749
2030 2327 7353 6007 9410 9179 2722 8445
0641 1489 0828 0385 8488 0422 7209 4950
Solution:
In the above random number table, we can start from any row or column and
read three digit numbers continuously row-wise or column wise.

Now we start from the third row, the numbers are:


203 023 277 353 600 794 109 179

272 284 450 641 148 908 280


Since some numbers are greater than 500, we subtract 500 from those numbers
and we rewrite the selected numbers as follows:

203 023 277 353 100 294 109 179 272 284 450 141 148

408 280

41
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

c) Random number selections using calculators or computers:


Random number can be generated through scientific calculator or computers. For
each press of the key get a new random number. The ways of selection of sample
is similar to that of using random number table.

Merits of using random numbers


Merits
1. Personal bias is eliminated as a selection depends solely on chance.
2. A random sample is in general a representative sample for a homogenous
population.
3. There is no need for the thorough knowledge of the units of the population.
4. The accuracy of a sample can be tested by examining another sample from
the same universe when the universe is unknown.
5. This method is also used in other methods of sampling. Limitations:
1. Preparing lots or using random number tables is tedious when the
population is large.
2. When there is large difference between the units of population, the simple
random sampling may not be a representative sample.
3. The size of the sample required under this method is more than that
required by stratified random sampling.
4. It is generally seen that the units of a simple random sample lie apart
geographically. The cost and time of collection of data are more.
2.5.2 Stratified Random Sampling
Of all the methods of sampling the procedure commonly used in surveys is
stratified sampling. This technique is mainly used to reduce the population
heterogeneity and to increase the efficiency of the estimates. Stratification means
division into groups. In this method, the population is divided into a number of
subgroups or strata. The strata should be so formed that each stratum is
homogeneous as far as possible. Then from each stratum a simple random sample
may be selected and these are combined together to form the required sample
from the population.

42
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

Types of Stratified Sampling


There are two types of stratified sampling. They are proportional and non-
proportional. In the proportional sampling equal and proportionate
representation is given to subgroups or strata. If the number of items is large, the
sample will have a higher size and vice versa.

The population size is denoted by N and the sample size is denoted by ‘n’ the
sample size is allocated to each stratum in such a way that the sample fractions is
a constant for each stratum. That is given by n/N = c. So in this method each
stratum is represented according to its size.

In non-proportionate sample, equal representation is given to all the sub-strata


regardless of their existence in the population.
Example 2:
A sample of 50 students is to be drawn from a population consisting of 500
students belonging to two institutions A and B. The number of students in the
institution A is 200 and the institution B is 300. How will you draw the sample
using proportional allocation?
Solution:
There are two strata in this case with sizes N1 = 200 and N2 = 300 and the total
population N = N1 + N2 = 500 The sample size is 50.

If n1 and n2 are the sample sizes,

n1 = n × N1 = 50 × 200 = 20
N 500

n2 = n × N2 = 50 × 300 = 30
N 500

The sample sizes are 20 from A and 30 from B. Then the units from each
institution are to be selected by simple random sampling.

43
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

Merits and Limitations of Stratified Sampling


Merits:
1. It is more representative.
2. It ensures greater accuracy.
3. It is easy to administer as the universe is sub - divided.
4. Greater geographical concentration reduces time and expenses.
5. When the original population is badly skewed, this method is appropriate.
6. For non – homogeneous population, it may field good results.

Limitations
1. To divide the population into homogeneous strata, it requires more
money, time and statistical experience which is a difficult one.
2. Improper stratification leads to bias, if the different strata overlap such a
sample will not be a representative one.
2.5.3 Systematic Sampling:
This method is widely employed because of its ease and convenience. A
frequently used method of sampling when a complete list of the population is
available is systematic sampling.
It is also called Quasi-random sampling.

Selection Procedure
The whole sample selection is based on just a random start. The first unit is
selected with the help of random numbers and the rest get selected automatically
according to some pre designed pattern is known as systematic sampling. With
systematic random sampling every Kth element in the frame is selected for the
sample, with the starting point among the first K elements determined at random.

For example, if we want to select a sample of 50 students from 500 students


under this method Kth item is picked up from the sampling frame and K is called
the sampling interval.

44
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

Sampling interval, K = N = Population Size


n Sample Size
K = 500 = 10
50
K = 10 is the sampling interval. Systematic sample consists in selecting a random
number say I, K and every Kth unit subsequently. Suppose the random number ‘i’
is 5, then we select 5, 15, 25, 35, 45,………. The random number ‘i’ is called
random start. The technique will generate K systematic samples with equal
probability.
Merits
1. This method is simple and convenient.
2. Time and work is reduced much.
3. If proper care is taken result will be accurate.
4. It can be used in infinite population.

Limitations
1. Systematic sampling may not represent the whole population.

2. There is a chance of personal bias of the investigators.


Systematic sampling is preferably used when the information is to be collected
from trees in a forest, house in blocks, entries in a register which are in a serial
order etc.

45
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

CHAPTER THREE
COLLECTION OF DATA:
CLASSIFICATION AND TABULATION
3.1 Introduction
Everybody collects, interprets and uses information, much of it in a numerical or
statistical forms in day-to-day life. It is a common practice that people receive
large quantities of information everyday through conversations, televisions,
computers, the radios, newspapers, posters, notices and instructions. It is just
because there is so much information available that people need to be able to
absorb, select and reject it. In everyday life, in business and industry, certain
statistical information is necessary and it is independent to know where to find it
how to collect it. As consequences, everybody has to compare prices and quality
before making any decision about what goods to buy. As employees of any firm,
people want to compare their salaries and working conditions, promotion
opportunities and so on. In time the firms on their part want to control costs and
expand their profits.

One of the main functions of statistics is to provide information which will help on
making decisions. Statistics provides the type of information by providing a
description of the present, a profile of the past and an estimate of the future. The
following are some of the objectives of collecting statistical information.

1. To describe the methods of collecting primary statistical information.


2. To consider the status involved in carrying out a survey.
3. To analyse the process involved in observation and interpreting.
4. To define and describe sampling.
5. To analyse the basis of sampling.
6. To describe a variety of sampling methods.

Statistical investigation is a comprehensive and requires systematic collection of


data about some group of people or objects, describing and organizing the data,
analyzing the data with the help of different statistical method, summarizing the

46
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

analysis and using these results for making judgements, decisions and predictions.
The validity and accuracy of final judgement is most crucial and depends heavily
on how well the data was collected in the first place. The quality of data will
greatly affect the conditions and hence at most importance must be given to this
process and every possible precaution should be taken to ensure accuracy while
collecting the data.
3.2 Nature of data:
It may be noted that different types of data can be collected for different
purposes. The data can be collected in connection with time or geographical
location or in connection with time and location. The following are the three
types of data:

1. Time series data.


2. Spatial data
3. Spacio-temporal data.

3.2.1 Time series data:


It is a collection of a set of numerical values, collected over a period of time. The
data might have been collected either at regular intervals of time or irregular
intervals of time.

Example 1:
The following is the data for the three types of expenditures in rupees for a family
for the four years 2001,2002,2003,2004.

Year Food Education Others Total


2001 3000 2000 3000 8000

2002 3500 3000 4000 10500

2003 4000 3500 5000 12500

2004 5000 5000 6000 16000

3.2.2 Spatial Data:

47
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

If the data collected is connected with that of a place, then it is termed as spatial
data. For example, the data may be

1. Number of runs scored by a batsman in different test matches in a test


series at different places

2. District wise rainfall in Tamilnadu

3. Prices of silver in four metropolitan cities Example 2:

The population of the southern states of India in 1991.

State Population
Tamilnadu 5,56,38,318
Andhra Pradesh 6,63,04,854
Karnataka 4,48,17,398
Kerala 2,90,11,237
Pondicherry 7,89,416

3.2.3 Spacio-Temporal Data:


If the data collected is connected to the time as well as place then it is known as
spacio-temporal data. Example 3:

State Population
1981 1991
Tamil Nadu 4,82,97,456 5,56,38,318
Andhra Pradesh 5,34,03,619 6,63,04,854
Karnataka 3,70,43,451 4,48,17,398
Kerala 2,54,03,217 2,90,11,237
Pondicherry 6,04,136 7,89,416

3.3 Categories of data


Any statistical data can be classified under two categories depending upon the
sources utilized.

These categories are:

48
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

1. Primary data 2. Secondary data

3.3.1 Primary data


Primary data is the one, which is collected by the investigator himself for the
purpose of a specific inquiry or study. Such data is original in character and is
generated by survey conducted by individuals or research institution or any
organization.

Example 4
If a researcher is interested to know the impact of noonmeal scheme for the
school children, he has to undertake a survey and collect data on the opinion of
parents and children by asking relevant questions. Such a data collected for the
purpose is called primary data.

The primary data can be collected by the following five methods.

1. Direct personal interviews.


2. Indirect Oral interviews.
3. Information from correspondents.
4. Mailed questionnaire method.
5. Schedules sent through enumerators.
1. Direct Personal Interviews
The persons from whom information are collected are known as informants. The
investigator personally meets them and asks questions to gather the necessary
information. It is the suitable method for intensive rather than extensive field
surveys. It suits best for intensive study of the limited field.
Merits
1. People willingly supply information because they are approached
personally. Hence, more response noticed in this method than in any
other method.
2. The collected information are likely to be uniform and accurate. The
investigator is there to clear the doubts of the informants.

49
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

3. Supplementary information on informant’s personal aspects can be noted.


Information on character and environment may help later to interpret
some of the results.
4. Answers for questions about which the informant is likely to be sensitive
can be gathered by this method.
5. The wordings in one or more questions can be altered to suit any
informant. Explanations may be given in other languages also.
Inconvenience and misinterpretations are thereby avoided.

Limitations
1. It is very costly and time consuming.
2. It is very difficult, when the number of persons to be interviewed is large
and the persons are spread over a wide area.
3. Personal prejudice and bias are greater under this method.
2. Indirect Oral Interviews:
Under this method the investigator contacts witnesses or neighbours or friends or
some other third parties who are capable of supplying the necessary information.
This method is preferred if the required information is on addiction or cause of
fire or theft or murder etc., If a fire has broken out a certain place, the persons
living in neighbourhood and witnesses are likely to give information on the cause
of fire. In some cases, police interrogated third parties who are supposed to have
knowledge of a theft or a murder and get some clues. Enquiry committees
appointed by governments generally adopt this method and get people’s views
and all possible details of facts relating to the enquiry. This method is suitable
whenever direct sources do not exist or cannot be relied upon or would be
unwilling to part with the information.

The validity of the results depends upon a few factors, such as the nature of the
person whose evidence is being recorded, the ability of the interviewer to draw
out information from the third parties by means of appropriate questions and
cross examinations, and the number of persons interviewed. For the success of
this method one person or one group alone should not be relied upon.
3. Information from correspondents

50
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

The investigator appoints local agents or correspondents in different places and


compiles the information sent by them. Information’s to Newspapers and some
departments of Government come by this method. The advantage of this method
is that it is cheap and appropriate for extensive investigations. But it may not
ensure accurate results because the correspondents are likely to be negligent,
prejudiced and biased. This method is adopted in those cases where
information’s are to be collected periodically from a wide area for a long time.

4. Mailed questionnaire method


Under this method, a list of questions is prepared and is sent to all the informants
by post. The list of questions is technically called questionnaire. A covering letter
accompanying the questionnaire explains the purpose of the investigation and
the importance of correct information and requests the informants to fill in the
blank spaces provided and to return the form within a specified time. This
method is appropriate in those cases where the informants are literates and are
spread over a wide area.
Merits
1. It is relatively cheap.
2. It is preferable when the informants are spread over the wide area.
Limitations
1. The greatest limitation is that the informants should be literates who are
able to understand and reply the questions.
2. It is possible that some of the persons who receive the questionnaires do
not return them.
3. It is difficult to verify the correctness of the information’s furnished by the
respondents.

With the view of minimizing non-respondents and collecting correct information,


the questionnaire should be carefully drafted. There is no hard and fast rule. But
the following general principles may be helpful in framing the questionnaire. A
covering letter and a self-addressed and stamped envelope should accompany
the questionnaire. The covering letter should politely point out the purpose of
the survey and privilege of the respondent who is one among the few associated

51
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

with the investigation. It should assure that the information’s would be kept
confidential and would never be misused. It may promise a copy of the findings
or free gifts or concessions etc.,

Characteristics of a good questionnaire


1. Number of questions should be minimal.
2. Questions should be in logical orders, moving from easy to more difficult
questions.
3. Questions should be short and simple. Technical terms and vague
expressions capable of different interpretations should be avoided.
4. Questions fetching YES or NO answers are preferable. There may be some
multiple-choice questions requiring lengthy answers are to be avoided.
5. Personal questions and questions which require memory power and
calculations should also be avoided.
6. Question should enable cross check. Deliberate or unconscious mistakes
can be detected to an extent.
7. Questions should be carefully framed so as to cover the entire scope of the
survey.
8. The wording of the questions should be proper without hurting the
feelings or arousing resentment.
9. As far as possible confidential information’s should not be sought.
10. Physical appearance should be attractive, sufficient space should be
provided for answering each question.
5. Schedules sent through Enumerators
Under this method enumerators or interviewers take the schedules, meet the
informants and filling their replies. Often distinction is made between the
schedule and a questionnaire. A schedule is filled by the interviewers in a face-to-
face situation with the informant. A questionnaire is filled by the informant which
he receives and returns by post. It is suitable for extensive surveys.

Merits

1. It can be adopted even if the informants are illiterates.


2. Answers for questions of personal and pecuniary nature can be collected.

52
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

3. Non-response is minimal as enumerators go personally and contact the


informants.
4. The information’s collected are reliable. The enumerators can be properly
trained for the same.
5. It is most popular methods.

Limitations

1. It is the costliest method.


2. Extensive training is to be given to the enumerators for collecting correct
and uniform information’s.
3. Interviewing requires experience. Unskilled investigators are likely to fail in
their work.

Before the actual survey, a pilot survey is conducted. The questionnaire/Schedule


is pre-tested in a pilot survey. A few among the people from whom actual
information is needed are asked to reply. If they misunderstand a question or
find it difficult to answer or do not like its wordings etc., it is to be altered.
Further it is to be ensured that every question fetches the desired answer.
Merits and Demerits of Primary Data
1. The collection of data by the method of personal survey is possible only if
the area covered by the investigator is small. Collection of data by
sending the enumerator is bound to be expensive. Care should be taken
twice that the enumerator record correct information provided by the
informants.
2. Collection of primary data by framing a schedule or distributing and
collecting questionnaires by post is less expensive and can be completed
in shorter time.
3. Suppose the questions are embarrassing or of complicated nature or the
questions probe into personnel affairs of individuals, then the schedules
may not be filled with accurate and correct information and hence this
method is unsuitable.

53
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

4. The information collected for primary data is more reliable than those
collected from the secondary data.
3.3.2 Secondary Data:
Secondary data are those data which have been already collected and analyzed by
some earlier agency for its own use; and later the same data are used by a
different agency. According to W.A. Neiswanger, ‘A primary source is a
publication in which the data are published by the same authority which gathered
and analyzed them. A secondary source is a publication, reporting the data which
have been gathered by other authorities and for which others are responsible’.

Sources of Secondary data


In most of the studies the investigator finds it impracticable to collect first-hand
information on all related issues and as such he makes use of the data collected
by others. There is a vast amount of published information from which statistical
studies may be made and fresh statistics are constantly in a state of production.
The sources of secondary data can broadly be classified under two heads:

1. Published sources, and


2. Unpublished sources

1. Published Sources:
The various sources of published data are:

A. Reports and official publications of

(i) International bodies such as the International Monetary Fund,


International Finance Corporation and United Nations Organization.
(ii) Central and State Governments such as the Report of the Tandon
Committee and Pay Commission.
2. Semi-official publication of various local bodies such as Municipal
Corporations and District Boards.
3. Private publications-such as the publications of –
(i) Trade and professional bodies such as the Federation of Indian
Chambers of Commerce and Institute of Chartered Accountants.

54
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

(ii) Financial and economic journals such as ‘Commerce’, ‘Capital’ and


‘Indian Finance’.
(iii) Annual reports of joint stock companies.
(iv) Publications brought out by research agencies, research scholars,
etc.

It should be noted that the publications mentioned above vary with regard to the
periodically of publication. Some are published at regular intervals (yearly,
monthly, weekly etc.,) whereas others are ad hoc publications, i.e., with no
regularity about periodicity of publications.
Note: A lot of secondary data is available in the internet. We can access it at any
time for the further studies.

2. Unpublished Sources
All statistical material is not always published. There are various sources of
unpublished data such as records maintained by various Government and private
offices, studies made by research institutions, scholars, etc. Such sources can also
be used where necessary

Precautions in the use of Secondary data


The following are some of the points that are to be considered in the use of
secondary data

1. How the data has been collected and processed


2. The accuracy of the data
3. How far the data has been summarized
4. How comparable the data is with other tabulations?
5. How to interpret the data, especially when figures collected for one
purpose is used for another.

Generally speaking, with secondary data, people have to compromise between


what they want and what they are able to find.

Merits and Demerits of Secondary Data

55
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

1. Secondary data is cheap to obtain. Many government publications are


relatively cheap and libraries stock quantities of secondary data produced
by the government, by companies and other organisations.
2. Large quantities of secondary data can be got through internet.
3. Much of the secondary data available has been collected for many years
and therefore it can be used to plot trends.
4. Secondary data is of value to:
- The government – help in making decisions and planning future policy.
- Business and industry – in areas such as marketing, and sales in order
to appreciate the general economic and social conditions and to
provide information on competitors.
- Research organizations – by providing social, economic and industrial
information.

3.4 Classification:
The collected data, also known as raw data or ungrouped data are always in an
unorganized form and need to be organized and presented in meaningful and
readily comprehensible form in order to facilitate further statistical analysis. It is,
therefore, essential for an investigator to condense a mass of data into more and
more comprehensible and assimilable form. The process of grouping into
different classes or sub classes according to some characteristics is known as
classification, tabulation is concerned with the systematic arrangement and
presentation of classified data. Thus, classification is the first step in tabulation.

For Example, letters in the post office are classified according to their destinations
viz., Delhi, Madurai, Bangalore, Mumbai etc.,
Objects of Classification
The following are main objectives of classifying the data:

1. It condenses the mass of data in an easily assimilable form.


2. It eliminates unnecessary details.
3. It facilitates comparison and highlights the significant aspect of data.
4. It enables one to get a mental picture of the information and helps in
drawing inferences.
5. It helps in the statistical treatment of the information collected.

56
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

Types of Classification
Statistical data are classified in respect of their
characteristics. Broadly there are four basic types of classification namely
a) Chronological classification
b) Geographical classification
c) Qualitative classification
d) Quantitative classification
a) Chronological Classification
In chronological classification, the collected data are arranged according to the
order of time expressed in years, months, weeks, etc., The data is generally
classified in ascending order of time. For example, the data related with
population, sales of a firm, imports and exports of a country are always subjected
to chronological classification.
Example 5:
The estimates of birth rates in India during 1970 – 76 are
Year 1970 1971 1972 1973 1974 1975 1976
Birth Rate 36.8 36.9 36.6 34.6 34.5 35.2 34.2

b) Geographical Classification
In this type of classification, the data are classified according to geographical
region or place. For instance, the production of paddy in different states in India,
production of wheat in different countries etc., Example 6:

Country America China Denmark France India


Yield of wheat in
(kg/acre) 1925 893 225 439 862

c) Qualitative Classification
In this type of classification data are classified on the basis of same attributes or
quality like sex, literacy, religion, employment etc. Such attributes cannot be
measured along with a scale.

57
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

For example, if the population to be classified in respect to one attribute, say sex,
then we can classify them into two namely that of males and females. Similarly,
they can also be classified into ‘employed’ or ‘unemployed’ on the basis of
another attribute ‘employment’.

Thus, when the classification is done with respect to one attribute, which is
dichotomous in nature, two classes are formed, one possessing the attribute and
the other not possessing the attribute. This type of classification is called simple
or dichotomous classification.

A simple classification may be shown as under

Population

Male Female

The classification, where two or more attributes are considered and several
classes are formed, is called a manifold classification. For example, if we classify
population simultaneously with respect to two attributes, e.g sex and
employment, then population are first classified with respect to ‘sex’ into ‘males’
and ‘females’. Each of these classes may then be further classified into
‘employment’ and ‘unemployment’ on the basis of attribute ‘employment’ and as
such Population are classified into four classes namely.

(i) Male employed


(ii) Male unemployed
(iii) Female employed
(iv) Female unemployed
Still the classification may be further extended by considering other attributes like
marital status etc. This can be explained by the following chart
Population

Male Female

58
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

Employed Unemployed Employed Unemployed

d) Quantitative classification
Quantitative classification refers to the classification of data according to some
characteristics that can be measured such as height, weight, etc., For example the
students of a college may be classified according to weight as given below.

Weight (in lbs) No of Students


90-100 50
100-110 200
110-120 260
120-130 360
130-140 90
140-150 40
Total 1000

In this type of classification there are two elements, namely (i) the variable (i.e)
the weight in the above example, and (ii) the frequency in the number of students
in each class. There are 50 students having weights ranging from 90 to 100 lb, 200
students having weight ranging between 100 to 110 lb and so on.

3.5 Tabulation
Tabulation is the process of summarizing classified or grouped data in the form of
a table so that it is easily understood and an investigator is quickly able to locate
the desired information. A table is a systematic arrangement of classified data in
columns and rows. Thus, a statistical table makes it possible for the investigator
to present a huge mass of data in a detailed and orderly form. It facilitates
comparison and often reveals certain patterns in data which are otherwise not
obvious. Classification and ‘Tabulation’, as a matter of fact, are not two distinct
processes. Actually, they go together. Before tabulation data are classified and
then displayed under different columns and rows of a table.
Advantages of Tabulation:
Statistical data arranged in a tabular form serve following objectives:

59
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

1. It simplifies complex data and the data presented are easily understood.
2. It facilitates comparison of related facts.
3. It facilitates computation of various statistical measures like averages,
dispersion, correlation etc.
4. It presents facts in minimum possible space and unnecessary repetitions
and explanations are avoided. Moreover, the needed information can be
easily located.
5. Tabulated data are good for references and they make it easier to present
the information in the form of graphs and diagrams.
Preparing a Table
The making of a compact table itself an art. This should contain all the
information needed within the smallest possible space. What the purpose of
tabulation is and how the tabulated information is to be used are the main points
to be kept in mind while preparing for a statistical table. An ideal table should
consist of the following main parts:

1. Table number
2. Title of the table
3. Captions or column headings
4. Stubs or row designation
5. Body of the table
6. Footnotes
7. Sources of data

Table Number
A table should be numbered for easy reference and identification. This number, if
possible, should be written in the centre at the top of the table. Sometimes it is
also written just before the title of the table.
Title
A good table should have a clearly worded, brief but unambiguous title explaining
the nature of data contained in the table. It should also state arrangement of data
and the period covered. The title should be placed centrally on the top of a table
just below the table number (or just after table number in the same line).
Captions or Column Headings
60
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

Captions in a table stands for brief and self-explanatory headings of vertical


columns. Captions may involve headings and sub-headings as well. The unit of
data contained should also be given for each column. Usually, a relatively less
important and shorter classification should be tabulated in the columns.

Stubs or Row Designations


Stubs stands for brief and self-explanatory headings of horizontal rows. Normally,
a relatively more important classification is given in rows. Also, a variable with a
large number of classes is usually represented in rows. For example, rows may
stand for score of classes and columns for data related to sex of students. In the
process, there will be many rows for scores classes but only two columns for male
and female students.
A model structure of a table is given below:

Table Number Title of the Table


Total
Sub Caption Headings
Heading
Caption Sub-Headings

Headings Body
Sub

Total

Foot notes:
Sources Note:
Body:
The body of the table contains the numerical information of frequency of
observations in the different cells. This arrangement of data is according to the
description of captions and stubs.
Footnotes

61
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

Footnotes are given at the foot of the table for explanation of any fact or
information included in the table which needs some explanation. Thus, they are
meant for explaining or providing further details about the data, that have not
been covered in title, captions and stubs.

Sources of data
Lastly one should also mention the source of information from which data are
taken. This may preferably include the name of the author, volume, page and the
year of publication. This should also state whether the data contained in the table
is of ‘primary or secondary’ nature.
Requirements of a Good Table
A good statistical table is not merely a careless grouping of columns and rows but
should be such that it summarizes the total information in an easily accessible
form in minimum possible space. Thus while preparing a table, one must have a
clear idea of the information to be presented, the facts to be compared and he
points to be stressed.

Though, there is no hard and fast rule for forming a table yet a few general point
should be kept in mind:

1. A table should be formed in keeping with the objects of statistical enquiry.


2. A table should be carefully prepared so that it is easily understandable.
3. A table should be formed so as to suit the size of the paper. But such an
adjustment should not be at the cost of legibility.
4. If the figures in the table are large, they should be suitably rounded or
approximated. The method of approximation and units of measurements
too should be specified.
5. Rows and columns in a table should be numbered and certain figures to be
stressed may be put in ‘box’ or ‘circle’ or in bold letters.
6. The arrangements of rows and columns should be in a logical and
systematic order. This arrangement may be alphabetical, chronological or
according to size.

62
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

7. The rows and columns are separated by single, double or thick lines to
represent various classes and sub-classes used. The corresponding
proportions or percentages should be given in adjoining rows and columns
to enable comparison. A vertical expansion of the table is generally more
convenient than the horizontal one.
8. The averages or totals of different rows should be given at the right of the
table and that of columns at the bottom of the table. Totals for every sub-
class too should be mentioned.
9. In case it is not possible to accommodate all the information in a single
table, it is better to have two or more related tables.

Type of Tables
Tables can be classified according to their purpose, stage of enquiry, nature of
data or number of characteristics used. On the basis of the number of
characteristics, tables may be classified as follows:

1. Simple or one-way table 2. Two-way table

3. Manifold table

Simple or one-way Table


A simple or one-way table is the simplest table which contains data of one
characteristic only. A simple table is easy to construct and simple to follow. For
example, the blank table given below may be used to show the number of adults
in different occupations in a locality.

The number of adults in different occupations in a locality


Occupations No. Of Adults

Total
Two-way Table
A table, which contains data on two characteristics, is called a two-way table. In
such case, therefore, either stub or caption is divided into two co-ordinate parts.
In the given table, as an example the caption may be further divided in respect of

63
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

‘sex’. This subdivision is shown in two-way table, which now contains two
characteristics namely, occupation and sex.

The number of adults in a locality in respect of occupation and sex


Occupation No. of Adults Total
Male Female

Total

Manifold Table
Thus, more and more complex tables can be formed by including other
characteristics. For example, we may further classify the caption sub-headings in
the above table in respect of “marital status”, “religion” and “socio-economic
status” etc. A table, which has more than two characteristics of data is considered
as a manifold table. For instance, table shown below shows three characteristics
namely, occupation, sex and marital status.

Occupation No. of Adults Total


Male Female
M U Total M U Total

Total
Foot note: M Stands for Married and U stands for unmarried.

Manifold tables, though complex are good in practice as these enable full
information to be incorporated and facilitate analysis of all related facts. Still, as a
normal practice, not more than four characteristics should be represented in one
table to avoid confusion. Other related tables may be formed to show the
remaining characteristics

64
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

CHAPTER FOUR
FREQUENCY DISTRIBUTION
4.1 Introduction
Frequency distribution is a series when a number of observations with similar or
closely related values are put in separate bunches or groups, each group being in
order of magnitude in a series. It is simply a table in which the data are grouped
into classes and the numbers of cases which fall in each class are recorded. It
shows the frequency of occurrence of different values of a single Phenomenon.

A frequency distribution is constructed for three main reasons:


1. To facilitate the analysis of data.
2. To estimate frequencies of the unknown population distribution from
the distribution of sample data and 3. To facilitate the computation of
various statistical measures
4.2 Raw data:
The statistical data collected are generally raw data or ungrouped data. Let us
consider the daily wages (in Rs) of 30 labourers in a factory.
80 70 55 50 60 65 40 30 80 90
75 45 35 65 70 80 82 55 65 80
60 55 38 65 75 85 90 65 45 75

The above figures are nothing but raw or ungrouped data and they are recorded
as they occur without any pre-consideration. This representation of data does
not furnish any useful information and is rather confusing to mind. A better way
to express the figures in an ascending or descending order of magnitude and is
commonly known as array. But this does not reduce the bulk of the data. The
above data when formed into an array is in the following form:

30 35 38 40 45 45 50 55 55 55
60 60 65 65 65 65 65 65 70 70
75 75 75 80 80 80 80 85 90 90

The array helps us to see at once the maximum and minimum values. It also
gives a rough idea of the distribution of the items over the range. When we have

65
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

a large number of items, the formation of an array is very difficult, tedious and
cumbersome. The Condensation should be directed for better understanding and
may be done in two ways, depending on the nature of the data. a) Discrete (or)

Ungrouped Frequency Distribution

In this form of distribution, the frequency refers to discrete value. Here the data
are presented in a way that exact measurements of units are clearly indicated.

There is definite difference between the variables of different groups of items.


Each class is distinct and separate from the other class. Non-continuity from one
class to another class exists. Data such as facts like the number of rooms in a
house, the number of companies registered in a country, the number of children
in a family, etc.

The process of preparing this type of distribution is very simple. We have just to
count the number of times a particular value is repeated, which is called the
frequency of that class. In order to facilitate counting, prepare a column of
tallies.

In another column, place all possible values of variable from the lowest to the
highest. Then put a bar (Vertical line) opposite the particular value to which it
relates.

To facilitate counting, blocks of five bars are prepared and some space is left
in between each block. We finally count the number of bars and get frequency.

Example 1
In a survey of 40 families in a village, the number of children per family was
recorded and the following data obtained.

1 0 3 2 1 5 6 2
2 1 0 3 4 2 1 6
3 2 1 5 3 3 2 4
2 2 3 0 2 1 4 5
3 3 4 4 1 2 4 5

66
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

Represent the data in the form of a discrete frequency distribution.

Solution:
Frequency distribution of the number of children

Number Tally Frequency


of Marks
Children
0 3
1 7
2 10
3 8
4 6
5 4
6 2
Total 40
b) Continuous Frequency Distribution
In this form of distribution refers to groups of values. This becomes necessary in
the case of some variables which can take any fractional value and in which case
an exact measurement is not possible. Hence a discrete variable can be
presented in the form of a continuous frequency distribution. Wage distribution
of 100 employees

Weekly wages (Rs) Number of employees

50-100 4
100-150 12
150-200 22
200-250 33
250-300 16
300-350 8
350-400 5
Total 100

67
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

4.3 Nature of Class


The following are some basic technical terms when a continuous frequency
distribution is formed or data are classified according to class intervals.

a) Class Limits

The class limits are the lowest and the highest values that can be included in the
class. For example, take the class 30-40. The lowest value of the class is 30 and
highest class is 40. The two boundaries of class are known as the lower limits and
the upper limit of the class. The lower limit of a class is the value below which
there can be no item in the class. The upper limit of a class is the value above
which there can be no item to that class. Of the class 60-79, 60 is the lower limit
and 79 is the upper limit, i.e. in the case there can be no value which is less than
60 or more than 79. The way in which class limits are stated depends upon the
nature of the data. In statistical calculations, lower class limit is denoted by L and
upper class limit by U.

b) Class Interval

The class interval may be defined as the size of each grouping of data. For
example, 50-75, 75-100, 100-125…are class intervals. Each grouping begins with
the lower limit of a class interval and ends at the lower limit of the next
succeeding class interval.
c) Width or size of the class interval:
The difference between the lower and upper class limits is called Width or size of
class interval and is denoted by ‘C’.

d) Range:

The difference between largest and smallest value of the observation is called
The Range and is denoted by ‘R’ i.e. R = Largest value – Smallest value

R=L–S

68
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

e) Mid-value or mid-point:
The central point of a class interval is called the mid value or mid-point. It is
found out by adding the upper and lower limits of a class and dividing the sum by
2.

(i.e.) Mid value = L + U


2

For example, if the class interval is 20 – 30, then, the mid-value is

= 25

f) Frequency
Number of observations falling within a particular class interval is called
frequency of that class.

Let us consider the frequency distribution of weights if persons working in a


company.

Weight Number of
(in kgs) persons
30-40 25
40-50 53
50-60 77
60-70 95
70-80 80
80-90 60
90-100 30
Total 420
In the above example, the class frequency is 25, 53, 77, 95, 80, 60, 30. The total
frequency is equal to 420. The total frequency indicates the total number of
observations considered in a frequency distribution.

g) Number of class intervals


The number of class interval in a frequency is matter of importance. The number
of class interval should not be too many. For an ideal frequency distribution, the
69
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

number of class intervals can vary from 5 to 15. To decide the number of class
intervals for the frequency distributive in the whole data, we choose the lowest
and the highest of the values. The difference between them will enable us to
decide the class intervals.

Thus the number of class intervals can be fixed arbitrarily keeping in view the
nature of problem under study or it can be decided with the help of Sturges’
Rule. According to him, the number of classes can be determined by the formula

K = 1 + 3. 322 log10 N

Where N = Total number of observations

log = logarithm of the number,

K = Number of class intervals.

Thus if the number of observation is 10, then the number of class intervals is

K = 1 + 3. 322 log 10 = 4.322 ≅ 4

If 100 observations are being studied, the number of class interval is

K = 1 + 3. 322 log 100 = 7.644 ≅ 8 and so on.


h) Size of the class interval
Since the size of the class interval is inversely proportional to the number of class
interval in a given distribution. The approximate value of the size (or width or
magnitude) of the class interval ‘C’ is obtained by using sturges rule as

Size of class interval = C = Range


Number of class interval

= Range
1+3.322 log N10

Where Range = Largest Value – smallest value in the distribution.

70
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

4.4 Types of class intervals


There are three methods of classifying the data according to class intervals
namely

a) Exclusive method
b) Inclusive method
c) Open-end classes
a) Exclusive method
When the class intervals are so fixed that the upper limit of one class is the lower
limit of the next class; it is known as the exclusive method of classification. The
following data are classified on this basis.
Expenditure (Rs.) No. of families

0 - 5000 60
5000-10000 95
10000-15000 122
15000-20000 83
20000-25000 40
Total 400

It is clear that the exclusive method ensures continuity of data as much as the
upper limit of one class is the lower limit of the next class. In the above example,
there are so families whose expenditure is between Rs.0 and Rs.4999.99. A
family whose expenditure is Rs.5000 would be included in the class interval 5000-
10000. This method is widely used in practice.

b) Inclusive method
In this method, the overlapping of the class intervals is avoided. Both the lower
and upper limits are included in the class interval. This type of classification may
be used for a grouped frequency distribution for discrete variable like members
in a family, number of workers in a factory etc., where the variable may take only
integral values. It cannot be used with fractional values like age, height, weight
etc.

This method may be illustrated as follows:


71
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

Class interval Frequency


5- 9 7
10-14 12
15-19 15
20-29 21
30-34 10
35-39 5
Total 70
Thus, to decide whether to use the inclusive method or the exclusive method, it
is important to determine whether the variable under observation in a
continuous or discrete one. In case of continuous variables, the exclusive
method must be used. The inclusive method should be used in case of discrete
variable.

c) Open end classes:


A class limit is missing either at the lower end of the first class interval or at the
upper end of the last class interval or both are not specified. The necessity of
open end classes arises in a number of practical situations, particularly relating to
economic and medical data when there are few very high values or few very low
values which are far apart from the majority of observations.
The example for the open-end classes as follows:

Salary Range No of workers

Below 2000 7
2000 – 4000 5
4000 – 6000 6
6000 – 8000 4
8000 and above 3

4.5 Construction of Frequency Table


Constructing a frequency distribution depends on the nature of the given data.
Hence, the following general consideration may be borne in mind for ensuring
meaningful classification of data.

72
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

1. The number of classes should preferably be between 5 and 20. However


there is no rigidity about it.
2. As far as possible one should avoid values of class intervals as 3, 7, 11,
26….etc. preferably one should have class intervals of either five or
multiples of 5 like 10, 20, 25, 100 etc.
3. The starting point i.e. the lower limit of the first class, should either be zero
or 5 or multiple of 5.
4. To ensure continuity and to get correct class interval we should adopt
“exclusive” method.
5. Wherever possible, it is desirable to use class interval of equal sizes.

4.6 Preparation of Frequency Table


The premise of data in the form of frequency distribution describes the basic
pattern which the data assumes in the mass. Frequency distribution gives a
better picture of the pattern of data if the number of items is large. If the identity
of the individuals about whom a particular information is taken, is not relevant
then the first step of condensation is to divide the observed range of variable
into a suitable number of class-intervals and to record the number of
observations in each class. Let us consider the weights in kg of 50 college
students.

42 62 46 54 41 37 54 44 32 45
47 50 58 49 51 42 46 37 42 39
54 39 51 58 47 64 43 48 49 48
49 61 41 40 58 49 59 57 57 34
56 38 45 52 46 40 63 41 51 41

Here the size of the class interval as per sturges rule is obtained as follows

Size of class interval = C = Range =


1 + 3.322 logN
= 32
6.64

73
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

Thus, the number of class interval is 7 and the size of each class is 5. The required
size of each class is 5. The required frequency distribution is prepared using tally
marks as given below:
Class Interval Tally marks Frequency
30-35 2
35-40 6
40-45 12
45-50 14
50-55 6
55-60 6
60-65 4
Total 50

Example 2:
Given below are the numbers of tools produced by workers in a factory.

43 18 25 18 39 44 19 20 20 26
40 45 38 25 13 14 27 41 42 17
34 31 32 27 33 37 25 26 32 25
33 34 35 46 29 34 31 34 35 24
28 30 41 32 29 28 30 31 30 34
31 35 36 29 26 32 36 35 36 37
32 23 22 29 33 37 33 27 24 36
23 42 29 37 29 23 44 41 45 39
21 21 42 22 28 22 15 16 17 28
22 29 35 31 27 40 23 32 40 37

Construct frequency distribution with inclusive type of class interval. Also, find.

1. How many workers produced more than 38 tools?


2. How many workers produced less than 23 tools?
Solution:
Using sturges formula for determining the number of class intervals, we have

74
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

Number of class intervals = 1 + 3.322 log10N

= 1+ 3.322 log10100
= 7.6

Sizes of class interval = Range


Number of class interval

=5

Hence, taking the magnitude of class intervals as 5, we have 7 classes 13-17, 18-
22… 43-47 are the classes by inclusive type. Using tally marks, the required
frequency distribution is obtained in the following table

Class Interval Tally Marks Number of tools produced


(Frequency)
13-17 6
18-22 11
23-27 18
28-32 25
33-37 22
38-42 11
43-47 7
Total 100

4.7 Percentage frequency table:


The comparison becomes difficult and at times impossible when the total
numbers of items are large and highly different one distribution to other. Under
these circumstances percentage frequency distribution facilitates easy
comparability. In percentage frequency table, we have to convert the actual
frequencies into percentages. The percentages are calculated by using the
formula given below:
75
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

Frequency Percentage = Actual Frequency × 100


Total Frequency
It is also called relative frequency table:

An example is given below to construct a percentage frequency table.

Marks No. of Students Frequency Percentage

0-10 3 6
10-20 8 16
20-30 12 24
30-40 17 34
40-50 6 12
50-60 4 8
Total 50 100
4.8 Cumulative Frequency Table
Cumulative frequency distribution has a running total of the values. It is
constructed by adding the frequency of the first-class interval to the frequency of
the second-class interval. Again, add that total to the frequency in the third-class
interval continuing until the final total appearing opposite to the last class
interval will be the total of all frequencies. The cumulative frequency may be
downward or upward. A downward cumulation results in a list presenting the
number of frequencies “less than” any given amount as revealed by the lower
limit of succeeding class interval and the upward cumulative results in a list
presenting the number of frequencies “more than” and given amount is revealed
by the upper limit of a preceding class interval.
Example 3:
Age group Number of Less than Cumulative More than cumulative
(in years) women frequency frequency
15-20 3 3 64
20-25 7 10 61
25-30 15 25 54
30-35 21 46 39
35-40 12 58 18
76
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

40-45 6 64 6

(a) Less than cumulative frequency distribution table


End values upper limit Less than Cumulative frequency

Less than 20 3
Less than 25 10
Less than 30 25
Less than 35 46
Less than 40 58
Less than 45 64

(b) More than cumulative frequency distribution table


End values lower limit Cumulative frequency more than

15 and above 64
20 and above 61
25 and above 54
30 and above 39
35 and above 18
40 and above 6
4.8.1 Conversion of cumulative frequency to simple Frequency:
If we have only cumulative frequency ‘either less than or more than’, we can
convert it into simple frequencies. For example, if we have ‘less than Cumulative
frequency, we can convert this to simple frequency by the method given below:

Class interval ‘less than’ Cumulative frequency Simple frequency

15-20 3 3
20-25 10 10 − 3 = 7
25-30 25 25 − 10 = 15
30-35 46 46 − 25 = 21
35-40 58 58 − 46 = 12
40-45 64 64 − 58 = 6

77
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

Method of converting ‘more than’ cumulative frequency to simple frequency is


given below.

Class interval ‘more than’ Cumulative frequency Simple frequency


15-20 64 64 − 61 = 3
20-25 61 61 − 54 = 7
25-30 54 54 −39 = 15
30-35 39 39 − 18 = 21
35-40 18 18 − 6 = 12
40-45 6 6−0 = 6
4.9 Cumulative percentage Frequency Table
Instead of cumulative frequency, if cumulative percentages are given, the
distribution is called cumulative percentage frequency distribution. We can form
this table either by converting the frequencies into percentages and then
cumulate it or we can convert the given cumulative frequency into percentages.

Example 4:

Income (in Rs) No of Family Cumulative Frequency Cumulative Percentage

2000-4000 8 8 5.7
4000-6000 15 23 16.4
6000-8000 27 50 35.7
8000-10000 44 94 67.1
10000-12000 31 125 89.3
12000-14000 12 137 97.9
14000-20000 3 140 100.0
Total 140

4.10 Bivariate Frequency Distribution


In the previous sections, we described frequency distribution involving one
variable only. Such frequency distributions are called univariate frequency
distribution. In many situations, simultaneous study of two variables becomes
necessary. For example, we want to classify data relating to the weights are

78
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

height of a group of individuals, income and expenditure of a group of


individuals, age of husbands and wives.

The data so classified on the basis of two variables give rise to the so called
bivariate frequency distribution and it can be summarized in the form of a table
is called bivariate (two-way) frequency table. While preparing a bivariate
frequency distribution, the values of each variable are grouped into various
classes (not necessarily the same for each variable). If the data corresponding to
one variable, say X is grouped into m classes and the data corresponding to the
other variable, say Y is grouped into n classes then the bivariate table will consist
of mxn cells. By going through the different pairs of the values, (X, Y) of the
variables and using tally marks we can find the frequency of each cell and thus,
obtain the bivariate frequency table. The format of a bivariate frequency table is
given below:

Format of Bivariate Frequency table

Marginal
x-series Class-Intervals Frequency
Mid-values of Y
y-series

fy

Marginal Total
frequency of X fx Ȉ fx = Ȉ fy= N

Here, f(x,y) is the frequency of the pair (x,y). The frequency distribution of the
values of the variables x together with their frequency total (fx) is called the
marginal distribution of x and the frequency distribution of the values of the

79
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

variable Y together with the total frequencies is known as the marginal frequency
distribution of Y. The total of the values of manual frequencies is called grand
total (N)
Example 5:
The data given below relate to the height and weight of 20 persons. Construct a
bivariate frequency table with class interval of height as 62-64, 64-66… and
weight as 115-125,125-135, write down the marginal distribution of X and Y.

S/N Height Weight S/No Height Weight


1 70 170 11 70 163
2 65 135 12 67 139
3 65 136 13 63 122
4 64 137 14 68 134
5 69 148 15 67 140
6 63 121 16 69 132
7 65 117 17 65 120
8 70 128 18 68 148
9 71 143 19 67 129
10 62 129 20 67 152
Solution:
Bivariate frequency table showing height and weight of persons.

Height (x) 6 7
Weight (y) 62-64 64-66 66-68 68-70 70-72 Total
115-125 II (2) II (2) 4

125-135 I (1) I (1) II (2) I (1) 5

135-145 III (3) II (2) I (1) 6

145-155 I (1) II (2) 3

155-165 I (1) 1

165-175 I (1) 1

80
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

Total 3 5 4 4 4 20

The marginal distribution of height and weight are given in the following table.

Marginal distribution of height (X) Marginal distribution of (Y)

CI Frequency CI Frequency
62-64 3 115-125 4
64-66 5 125-135 5
66-68 4 135-145 6
68-70 4 145-155 3
70-72 4 155-165 1
Total 20 165-175 1
Total 20

CHAPTER FIVE

DIAGRAMATIC AND GRAPHICAL REPRESENTATION


5.1 Introduction
In the previous chapter, we have discussed the techniques of classification and
tabulation that help in summarizing the collected data and presenting them in a
systematic manner. However, these forms of presentation do not always prove
to be interesting to the common man. One of the most convincing and appealing
ways in which statistical results may be presented is through diagrams and
graphs. Just one diagram is enough to represent a given data more effectively
than thousand words.

Moreover, even a layman who has nothing to do with numbers can also
understands diagrams. Evidence of this can be found in newspapers, magazines,
journals, advertisement, etc. An attempt is made in this chapter to illustrate
some of the major types of diagrams and graphs frequently used in presenting
statistical data.

81
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

5.2 Diagrams
A diagram is a visual form for presentation of statistical data, highlighting their
basic facts and relationship. If we draw diagrams on the basis of the data
collected they will easily be understood and appreciated by all. It is readily
intelligible and save a considerable amount of time and energy.

5.3 Significance of Diagrams and Graphs:

Diagrams and graphs are extremely useful because of the following reasons.

1. They are attractive and impressive.


2. They make data simple and intelligible.
3. They make comparison possible
4. They save time and labour.
5. They have universal utility.
6. They give more information.
7. They have a great memorizing effect.

5.4 General rules for constructing diagrams:


The construction of diagrams is an art, which can be acquired through practice.
However, observance of some general guidelines can help in making them more
attractive and effective. The diagrammatic presentation of statistical facts will be
advantageous provided the following rules are observed in drawing diagrams.

1. A diagram should be neatly drawn and attractive.


2. The measurements of geometrical figures used in diagram should be
accurate and proportional.
3. The size of the diagrams should match the size of the paper.
4. Every diagram must have a suitable but short heading.
5. The scale should be mentioned in the diagram.
6. Diagrams should be neatly as well as accurately drawn with the help of
drawing instruments.
7. Index must be given for identification so that the reader can easily make
out the meaning of the diagram.
8. Footnote must be given at the bottom of the diagram.
9. Economy in cost and energy should be exercised in drawing diagram.

82
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

5.5 Types of diagrams:


In practice, a very large variety of diagrams are in use and new ones are
constantly being added. For the sake of convenience and simplicity, they may be
divided under the following heads:

1. One-dimensional diagrams
2. Two-dimensional diagrams
3. Three-dimensional diagrams
4. Pictograms and Cartograms
5.5.1 One-dimensional diagrams
In such diagrams, only one-dimensional measurement, i.e height is used and the
width is not considered. These diagrams are in the form of bar or line charts and
can be classified as

1. Line Diagram
2. Simple Diagram
3. Multiple Bar Diagram
4. Sub-divided Bar Diagram
5. Percentage Bar Diagram

Line Diagram
Line diagram is used in case where there are many items to be shown and there
is not much of difference in their values. Such diagram is prepared by drawing a
vertical line for each item according to the scale. The distance between lines is
kept uniform.

Line diagram makes comparison easy, but it is less attractive.

Example 1:
Show the following data by a line chart:

No. of children 0 1 2 3 4 5

Frequency 10 14 9 6 4 2

83
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

Line Diagram

16
14
12
10
8
6
4
2
0
0 1 2 3 4 5 6
No. of Children

Simple Bar Diagram:


Simple bar diagram can be drawn either on horizontal or vertical base, but bars
on horizontal base more common. Bars must be uniform width and intervening
space between bars must be equal.While constructing a simple bar diagram, the
scale is determined on the basis of the highest value in the series.

To make the diagram attractive, the bars can be coloured. Bar diagram are used
in business and economics. However, an important limitation of such diagrams is
that they can present only one classification or one category of data. For
example, while presenting the population for the last five decades, one can only
depict the total population in the simple bar diagrams, and not its sex-wise
distribution.

Example 2:
Represent the following data by a bar diagram.

Production
Year (in tones)
1991 45
1992 40
1993 42
1994 55
1995 50

84
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

Solution:
Simple Bar Diagram

60

50

40

30

20

10

0
1991 1992 1993 1994 1995
Year

Multiple Bar Diagram:


Multiple bar diagram is used for comparing two or more sets of statistical
data. Bars are constructed side by side to represent the set of values for
comparison. In order to distinguish bars, they may be either differently coloured
or there should be different types of crossings or dotting, etc. An index is also
prepared to identify the meaning of different colours or dotting’s.

Example 3:
Draw a multiple bar diagram for the following data.

Profit before Profit after tax


Year tax (in lakhs of (in lakhs of
rupees) rupees)
1998 195 80
1999 200 87
2000 165 45
2001 140 32
Solution:
Multiple Bar Diagram

85
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

200
180
160
140
120
100
80
60
40
20
0
1998 1999 2000 2001
Year

Profit before Profit after


tax tax
Sub-divided Bar Diagram:
In a sub-divided bar diagram, the bar is sub-divided into various parts in
proportion to the values given in the data and the whole bar represent the total.
Such diagrams are also called Component Bar diagrams. The sub divisions are
distinguished by different colours or crossings or dottings.

The main defect of such a diagram is that all the parts do not have a common
base to enable one to compare accurately the various components of the data.

Example 4:
Represent the following data by a sub-divided bar diagram.

Monthly expenditure
Expenditure items (in Rs.)
Family A Family B
Food 75 95
Clothing 20 25
Education 15 10
Housing Rent 40 65
Miscellaneous 25 35

86
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

Solution:
Sub-divided Bar Diagram
240

220

200

180

160

140

120

100

80

60

40

20

0
Family A Family B
Expenditure item

Food Clothing Education


Housing Rent Miscellaneous

Percentage bar diagram:


This is another form of component bar diagram. Here the components are
not the actual values but percentages of the whole. The main difference between
the sub-divided bar diagram and percentage bar diagram is that in the former the
bars are of different heights since their totals may be different whereas in the
latter the bars are of equal height since each bar represents 100 percent. In the
case of data having sub-division, percentage bar diagram will be more appealing
than sub-divided bar diagram.

Example 5:
Represent the following data by a percentage bar diagram.

Particular Factory A Factory B


Selling Price 400 650
Quantity Sold 240 365
Wages 3500 5000
Materials 2100 3500
Miscellaneous 1400 2100
Solution:
Convert the given values into percentages as follows:

87
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

Particulars Factory A Factory B


Rs. % Rs. %
Selling Price 400 5 650 6
Quantity Sold 240 3 365 3
Wages 3500 46 5000 43
Materials 2100 28 3500 30
Miscellaneous 1400 18 2100 18
Total 7640 100 11615 100
Solution:
Sub-divided Percentage Bar Diagram
100
80
60
40
20
0
Factory A Factory B
Particulars

Selling priceQuantity sold


Materials Miscellaneous
5.5.2 Two-dimensional Diagrams:
In one-dimensional diagrams, only length 9 is taken into account. But in two-
dimensional diagrams the area represent the data and so the length and breadth
have both to be taken into account. Such diagrams are also called area diagrams
or surface diagrams. The important types of area diagrams are:

1. Rectangles 2. Squares 3. Pie-diagrams

Rectangles:

Rectangles are used to represent the relative magnitude of two or more values.
The area of the rectangles is kept in proportion to the values. Rectangles are
placed side by side for comparison. When two sets of figures are to be
represented by rectangles, either of the two methods may be adopted.

88
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

We may represent the figures as they are given or may convert them to
percentages and then subdivide the length into various components. Thus the
percentage sub-divided rectangular diagram is more popular than sub-divided
rectangular since it enables comparison to be made on a percentage basis.

Example 6:
Represent the following data by sub-divided percentage rectangular diagram.

Family A Family B
Items of
(Income (income
Expenditure
Rs.5000) Rs.8000)
Food 2000 2500
Clothing 1000 2000
House Rent 800 1000
Fuel and lighting 400 500
Miscellaneous 800 2000
Total 5000 8000
Solution:
The items of expenditure will be converted into percentage as shown below:

Family A Family B
Items of Expenditure
Rs. Y Rs. Y
Food 2000 40 2500 31
Clothing 1000 20 2000 25
House Rent 800 16 1000 13
Fuel and Lighting 400 8 500 6
Miscellaneous 800 16 2000 25
Total 5000 100 8000 100
SUBDIVIDED PERCENTAGE RECTANGULAR DIAGRAM

89
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
120

100

80

60

40

20

0
Family A (0-5000) Family B (0-8000)

Food Clothing House Rent Fuel and Lighting Miscellaneous

Squares:
The rectangular method of diagrammatic presentation is difficult to use where
the values of items vary widely. The method of drawing a square diagram is very
simple. One has to take the square root of the values of various item that are to
be shown in the diagrams and then select a suitable scale to draw the squares.

Example 7:
Yield of rice in Kgs. per acre of five countries are

Country U.S.A Australia U. K Canada India


Yield of
rice in Kgs
6400 1600 2500 3600 4900
per
acre
Represent the above data by Square diagram.
Solution: To draw the square diagram we calculate as follows:

Country Yield Square root Side of the


square in
cm
U.S. A 6400 80 4
Australia 1600 40 2
U.K. 2500 50 2.5
Canada 3600 60 3
India 4900 70 3.5

90
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

4 cm 2.5 3 cm 3.5 cm
2cm cm

USA AUST UK CANADA INDIA


Pie Diagram or Circular Diagram:
Another way of preparing a two-dimensional diagram is in the form of circles. In
such diagrams, both the total and the component parts or sectors can be shown.
The area of a circle is proportional to the square of its radius.

While making comparisons, pie diagrams should be used on a percentage basis


and not on an absolute basis. In constructing a pie diagram the first step is to
prepare the data so that various components values can be transposed into
corresponding degrees on the circle.

The second step is to draw a circle of appropriate size with a compass. The size
of the radius depends upon the available space and other factors of presentation.
The third step is to measure points on the circle and representing the size of each
sector with the help of a protractor.
Example 8:
Draw a Pie diagram for the following data of production of sugar in quintals of
various countries.

Production of
Country Sugar (in
quintals)
Cuba 62
Australia 47
India 35
Japan 16
Egypt 6
Solution:
The values are expressed in terms of degree as follows.

91
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

Production of
Sugar
In In
Country Quintals Degrees
Cuba 62 134
Australia 47 102
India 35 76
Pie Diagram

Japan 16 35
Egypt 6 13
Total 166 360
Cuba
Australia
India
Japan
Egypt

5.5.3 Three-dimensional diagrams:


Three-dimensional diagrams, also known as volume diagram, consist of cubes,
cylinders, spheres, etc. In such diagrams three things, namely length, width and
height have to be taken into account. Of all the figures, making of cubes is easy.
Side of a cube is drawn in proportion to the cube root of the magnitude of data.

92
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

Cubes of figures can be ascertained with the help of logarithms. The logarithm of
the figures can be divided by 3 and the antilog of that value will be the cube-root.

Example 9:
Represent the following data by volume diagram.

Category Number of
Students
Under graduate 64000
Post graduate 27000
Professionals 8000
Solution:
The sides of cubes can be determined as follows

Number Side of
Cube
Category of cube
root
students
Undergraduate 64000 40 4 cm
Postgraduate 27000 30 3 cm
Professional 8000 20 2 cm

Undergraduate Postgraduate professional

5.5.4 Pictograms and Cartograms:


Pictograms are not abstract presentation such as lines or bars but really depict
the kind of data we are dealing with. Pictures are attractive and easy to
comprehend and as such this method is particularly useful in presenting statistics
to the layman. When Pictograms are used, data are represented through a
pictorial symbol that is carefully selected.

93
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

Cartograms or statistical maps are used to give quantitative information as a


geographical basis. They are used to represent spatial distributions. The
quantities on the map can be shown in many ways such as through shades or
colours or dots or placing pictogram in each geographical unit.

5.6 Graphs:
A graph is a visual form of presentation of statistical data. A graph is more
attractive than a table of figure. Even a common man can understand the
message of data from the graph. Comparisons can be made between two or
more phenomena very easily with the help of a graph.

However here we shall discuss only some important types of graphs which
are more popular and they are

1. Histogram 2. Frequency Polygon

3. Frequency Curve 4. Ogive 5. Lorenz Curve

5.6.1 Histogram:
A histogram is a bar chart or graph showing the frequency of occurrence of each
value of the variable being analysed. In histogram, data are plotted as a series of
rectangles. Class intervals are shown on the ‘X-axis’ and the frequencies on the
‘Y-axis’.

The height of each rectangle represents the frequency of the class interval.
Each rectangle is formed with the other so as to give a continuous picture. Such
a graph is also called staircase or block diagram.

However, we cannot construct a histogram for distribution with open-end


classes. It is also quite misleading if the distribution has unequal intervals and
suitable adjustments in frequencies are not made.

Example 10:
Draw a histogram for the following data.

Daily Wages Number of Workers


0-50 8
94
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

50-100 16
100-150 27
150-200 19
200-250 10
250-300 6
Solution:
HISTOGRAM

30

25

20

15

10

0
50 100 150 200 250
Daily Wages (in Rs.)

Example 11:
For the following data, draw a histogram.

Number of
Marks
Students
21-30 6
31-40 15
41-50 22
51-60 31
61-70 17
71-80 9
Solution:
For drawing a histogram, the frequency distribution should be continuous. If it is
not continuous, then first make it continuous as follows.

Number of
Marks
Students
20.5-30.5 6
30.5-40.5 15

95
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

40.5-50.5 22
50.5-60.5 31
60.5-70.5 17
70.5-80.5 9
HISTOGRAM

35

30

25

20

15

10

0
20.5 30.5 40.5 50.5 60.5 70.5 80.5
Marks

Example 12:
Draw a histogram for the following data.

Profits Number of
(in Companies
lakhs)
0-10 4
10-20 12
20-30 24
30-50 32
50-80 18
80-90 9
90-100 3
Solution:
When the class intervals are unequal, a correction for unequal class intervals
must be made. The frequencies are adjusted as follows: The frequency of the

96
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

class 30-50 shall be divided by two since the class interval is in double. Similarly,
the class interval 5080 can be divided by 3. Then draw the histogram.

Now we rewrite the frequency table as follows.

Profits Number of
(in Companies
lakhs)
0-10 4
10-20 12
20-30 24
30-40 16
40-50 16
50-60 6
60-70 6
70-80 6
80-90 9
90-100 3
HISTOGRAM

30

25

20

15

10

0
10 20 30 40 50 60 70 80 90 100
Profits (in Lakhs)

5.6.2 Frequency Polygon:


If we mark the midpoints of the top horizontal sides of the rectangles in a
histogram and join them by a straight line, the figure so formed is called a
Frequency Polygon. This is done under the assumption that the frequencies in a
class interval are evenly distributed throughout the class. The area of the
polygon is equal to the area of the histogram, because the area left outside is just
equal to the area included in it.
97
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

Example 13:
Draw a frequency polygon for the following data.

Weight (in Number of


kg) Students
30-35 4
35-40 7
40-45 10
45-50 18
50-55 14
55-60 8
60-65 3
FREQUENCY POLYGON

20

18

16

14

12

10

0
30 35 40 45 50 55 60 65

Weight (in kgs)

5.6.3 Frequency Curve:


If the middle point of the upper boundaries of the rectangles of a histogram
is corrected by a smooth freehand curve, then that diagram is called frequency
curve. The curve should begin and end at the base line.

Example 14:
Draw a frequency curve for the following data.

Monthly Wages No. of family


(in Rs.)

98
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

0-1000 21
1000-2000 35
2000-3000 56
3000-4000 74
4000-5000 63
5000-6000 40
6000-7000 29
7000-8000 14
Solution:
80

70

60

50

40

30

20

10

FREQUENCY CURVE
1000 2000 3000 4000 5000 6000 7000 Monthly Wages (in Rs.) 8000
Monthly Wages in Rs.
5.6.4 Ogives
For a set of observations, we know how to construct a frequency distribution. In
some cases, we may require the number of observations less than a given value
or more than a given value. This is obtained by an accumulating (adding) the
frequencies up to (or above) the give value. This accumulated frequency is called
cumulative frequency.

These cumulative frequencies are then listed in a table is called cumulative


frequency table. The curve table is obtained by plotting cumulative frequencies
is called a cumulative frequency curve or an ogive.

There are two methods of constructing ogive namely:

99
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

1. The ‘less than ogive’ method

2. The ‘more than ogive’ method.

In less than ogive method we start with the upper limits of the classes and go
adding the frequencies. When these frequencies are plotted, we get a rising
curve. In more than ogive method, we start with the lower limits of the classes
and from the total frequencies we subtract the frequency of each class. When
these frequencies are plotted we get a declining curve.

Example 15:
Draw the Ogives for the following data.

Class Frequency
interval
20-30 4
30-40 6
40-50 13
50-60 25
60-70 32
70-80 19
80-90 8
90-100 3
Solution:
Class Less More
limit than than
ogive ogive
20 0 110
30 4 106
40 10 100
50 23 87
60 48 62
70 80 30
80 99 11
90 107 3

100
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

100 110 0
Ogives
x axis 1cm = 10 units
120
Y y axis 1 cm = 10 units
110
100
90
80
70
60
50
40
30
20
10
0
20 30 40 50 60 70 80 90 100 X
Class limit

5.6.5 Lorenz Curve


Lorenz curve is a graphical method of studying dispersion. It was introduced by
Max.O.Lorenz, a great Economist and a statistician, to study the distribution of
wealth and income. It is also used to study the variability in the distribution of
profits, wages, revenue, etc.

It is specially used to study the degree of inequality in the distribution of income


and wealth between countries or between different periods. It is a percentage of
cumulative values of one variable in combined with the percentage of cumulative
values in other variable and then Lorenz curve is drawn.

The curve starts from the origin (0,0) and ends at (100,100). If the wealth,
revenue, land etc are equally distributed among the people of the country, then
the Lorenz curve will be the diagonal of the square. But this is highly impossible.

The deviation of the Lorenz curve from the diagonal, shows how the wealth,
revenue, land etc are not equally distributed among people.

101
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org

Example 16:
In the following table, profit earned is given from the number of companies
belonging to two areas A and B. Draw in the same diagram their Lorenz curves
and interpret them.

Number of
Profit Companies
earned (in Area
Area B
thousands) A
5 7 13
26 12 25
65 14 43
89 28 57
110 33 45
155 25 28
180 18 13
200 8 6
Solution:
Profits Area A Area B

5 5 1 7 7 5 13 13 6
26 31 4 12 19 13 25 38 17
65 96 12 14 33 23 43 81 35
89 185 22 28 61 42 57 138 60
110 295 36 33 94 65 45 183 80
155 450 54 25 119 82 28 211 92
180 630 76 18 137 94 13 224 97
200 830 100 8 145 100 6 230 100

102
JOINT PROFESSIONALS TRAINING AND SUPPORT INTERNATIONAL www.jptsonline.org
LORENZ-CURVE

100

90

80

70

60
Line of Equal Distribution

50
Area-A

40 Area-B

30

20

10

0
0 20 40 60 80 100

Cumulative Percentage of Company

103

You might also like