Descriptive Statistics Notes

Origin and Development of Statistics
Statistics, in a sense, is as old as the human society itself. Its origin can be traced to the
old days when it was regarded as the ‘science of statecraft’ and was the by-product of
administrative activities of the State.
The word Statistics has been derived from the Latin word ‘Status’, the Italian word
‘Statista’ the German word ‘Statistik’, each of which means a political state.
In ancient times, the government used to collect the information regarding the
population and property or wealth of the country. The former enabling the government
to have an idea of the manpower of the country (To safeguard the nations against
external aggressions, if any) and the latter provides a basis for introducing new taxes
and levies.
In India, an efficient system of collecting official and administrative statistics existed
even more than 2000 years ago, in particularly during the reign of Chandragupta
Maurya (324-300 BC). From Kautilya’s Arsthashastra, it is known that even before 300
BC, there existed a very good system of collecting vital statistics (Data relates to
population). During Akbar’s reign (1556-1605 AD), Raja Thodormal the then land and
revenue minister, maintained the records of land and agricultural statistics. In Aina–e–
Abkari written by Abul Fazal (in 1596-97), one of the nine gems of Abkar, we find
detailed accounts of the administrative and statistical surveys conducted during Akbar’s
reign.
In Germany, the systematic collection of official statistics originated towards the end of
the 18th century. Information regarding population and output (Industrial and
agricultural) were collected in order to have an idea of the relative strength of different
areas.
In England, statistics are the outcome of the Napoleonic war. The wars necessitated the
systematic collection of numerical data to enable the government to assess the revenue
and expenditure with greater precision and then to levy new taxes in order to meet the
cost of war.
The origin of vital statistics was in the 17 th century. Capt. John Grant (1620-74) known
as the father of vital statistics, was the first man to study the statistics of birth and
death. The computation of mortality table and the calculation of expectation of life at
different ages lead to the idea of life insurance and the first life insurance institution
was founded in London in 1968. The theoretical development of the so called modern
statistics came during the 17th century with the introduction of theory of probability and
theory of games and chances, the chief contributors being the mathematicians and
gamblers of France, Germany and England. Francis Dalton (1822-1921, with his works
on regression, pioneered the use of statistical methods in the field of biometry. Karl
Pearson (1911) is the pioneer of correlation analysis. His discovery of Chi-square test,
the first and the most important of test of significance won for statistics a place as a
science.
Sir Ronald A. Fisher (1890-1962) known as the father of statistics, placed
statistics on a very sound footing by applying it to various diversified fields such as
genetics, education, agriculture etc.
1
Definitions of Statistics:
Statistics has been defined differently by different statisticians from time to time. The
reasons for a variety of definitions are two.
1. In ancient times, statistics was confined only to the affairs of the state. But
in modern times, the field of utility of statistics has widened considerably.
Hence, a number of old definitions which were confined to a very narrow
field of enquiry were replaced by new definitions which are much more
comprehensive and exhaustive.
2. Statistics has been defined in two ways. Some statisticians have defined it as
statistical data i.e.; numerical statement of facts, while others define it as
statistical methods, i.e.; complete body of principles and techniques used in
collecting and analyzing such data.
Some Important Definitions
Statistics as statistical data:
Webster defines statistics as “classified facts representing the conditions of people in a
State – especially those facts which can be stated in numbers or in any other tabular or
classified arrangements.” This definition confines statistics only to the data pertaining
to a State.
Bowley defines statistics as “numerical statements of facts in any department of
enquiry placed in relation to each other.”
An exhaustive definition is given by Prof. Horace Secrist, “By statistics, we mean
aggregate of facts affected to a marked extent by multiplicity of causes numerically
expressed, enumerated or estimated according to reasonable standards of accuracy
collected in a systematic manner for a predetermined purpose and placed in relation to
each other.”
Statistics as Statistical Method:
Bowley defines statistics in the following three ways:
1. Statistics may be called the science of counting.
2. Statistics may be rightly called the science of averages.
3. Statistics is the science of the measurement of social organism.
But none of Bowley’s definition is adequate. Firstly, Statistics is not merely confined to
the collection of data as other aspects like Presentation, analysis and interpretation are
also covered by it. Secondly, averages are only a part of statistical tools used in the
analysis of the data. Others being dispersion, skewness, correlation etc. Finally, his
definition restricts the application of statistics to Sociology alone. But in modern days,
statistics is used in almost all the fields.
According to Boddington, “Statistics is the science of estimates and probabilities.”
This definition constitutes only a part of statistical methods.
2
According to King, “Statistics is the method of judging collective, natural or social
phenomenon from the results obtained from the analysis or enumeration or collection
of estimates.”
Lovitt defines statistics as the “Science which deals with the collection, classification
and tabulation of numerical facts as the basis for explanation, description and
comparison of phenomenon.”
The best definition is given by Croxton and Cowden. According to them, “Statistics is
defined as the science which deals with the collection, analysis and interpretation of
numerical data.”
Functions of Statistics
1. It presents the facts in a definite form: Statistics presents facts in a precise
and definite from by expressing it in the numerical or quantitative from. For
e.g., the statement ‘the number of students passed in statistics paper in NUALS
in the year 2002-03 was higher than that in 2001-02 will not give a clear idea of
the situation. However, the statement ‘the number of students who passed in
statistics paper in NUALS in the year 2002-03 was 54 as compared to the year
2001-02 in which the number of students passed was 50 conveys a definite
information.
2. It simplifies a mass of figures: Statistics helps in condensing mass of data into
few significant figures. Hence, the statistical methods present meaningful
overall information of the mass of data.
3. It helps in formulating and testing of hypothesis: Statistical methods are
extremely helpful in formulating and testing hypothesis to develop new
theories. For e.g., whether students have benefited from extra coaching can be
tested by appropriate statistical tools.
4. It helps in prediction: Statistical methods provide helpful means of forecasting
future events. For e.g., a cement manufacturer can predict how much cement he
should produce in 2010 based on the demand for it in the current year.
5. It helps in formulation of suitable policies: Statistics provide the basic
material for framing suitable policies for the Government and other agencies.
For e.g., the data regarding population helps in determining the future needs
such as food, clothing etc.
6. It facilitates comparison: Statistical methods provide comparison for same
kind of figures. For e.g., if we know the average marks of students of 2
semesters for a particular subject, we can compare the average marks and
conclude students which semester is better in that subject.
3
Limitations of statistics
Statistics and its techniques are widely used in every branch of knowledge. W.I. King
rightly says: “Science of statistics is the most useful servant, but only of great value to
those who understand its proper use. The scope of statistics is very wide and it has great
utility; but these are restricted by its limitations. Following are the important limitations
of statistics.
Statistics does not deal with individual items: statistics deals with groups or
aggregates only and the study of an individual fact lies outside the scope of statistics.
For example the mark secured by a student in legal history is 40 does not constitute a
statistical statement, where as the average marks secured by students of 2012 batch in
legal history is 55 constitute a statistical statement.
Statistics deals with quantitative data: Statistics does not study the data which cannot
be measured in quantitative terms. For e.g., average height of students of a class, per
capita income etc can be studied by the statistical methods. But qualitative aspects such
as honesty beauty etc cannot be studied directly. According to Prof. Horace Secrist,”
Some phenomenon cannot be quantitatively measured; honesty, resourcefulness,
integrity, goodwill, all important in industry as well as in life, is generally not
susceptible to direct statistical measurement.” If we convert the qualitative data into
quantitative data, comparison is possible. For e.g. Intelligence of a student can be
measured by his rank or marks scored.
Statistical laws are true only on average: statistical laws are not perfect as the laws of
physics or astronomy. The reason is that “statistical laws are true only on average”.
Statistics deals with such phenomenon which is affected by multiplicity of causes and it
is not possible to study the effects of each of these factors separately as is done under
experimental methods. It is due to this limitation in statistical methods, the conclusions
arrived at are not perfectly accurate and consequently the same conclusions cannot be
arrived at under similar conditions at all times. Statistics are the means and not a
solution to a problem.
Statistics does not reveal the entire story: since most of the problems are affected by
such factors which are incapable of statistical analysis, it is not possible to examine a
problem in all its manifestations only by a statistical approach. According the Marshall,
“statistics are the straws, out of which, I like every other economist to have make
bricks”. Croxton says: “It must not be assumed that statistical method is the only
method for use in research; neither should this method be considered the best attack for
every problem.
Statistics is liable to be misused: the most important limitation is that it must be used
by experts. According to Bowley, “Statistics only furnishes a tool though imperfect,
which is dangerous in the hands of those who do not know its used and deficiencies.
W.I King states, “Statistics are like clay of which you can make a god or devil as you
4
please”. He remarked “Science of Statistics is the useful servant, but only of great value
to those who understand its proper use”
Statistics is a good tool to an expert. The greatest limitation of statistics is that it deals
with figures which can be easily distorted, manipulated or moulded by politicians,
corrupt, dishonest or unskilled workers for personal selfish motives. Statistics neither
prove nor disprove anything. It is only a tool, if rightly used may prove quite useful but
if misused by unskilled and dishonest statisticians might lead to very fallacious
conclusions.
Importance of Statistics in Different fields

Importance and scope of statistics
In modern times, statistics is viewed not as a mere device for collecting numerical data
but as a means of developing sound techniques for their handling and analysis for
drawing valid inferences from them. As such it is not confined to the affairs of the state
but is intruding constantly into various diversified spheres of life – social, economic
and political. It is now finding wide applications in almost all sciences – social as well
as physical – such as biology, psychology, education, economics, business management
etc. It is hardly possible to enumerate even a single department of human activity
where statistics does not creep in. it is rather indispensable in all phases of human
endeavour.
1. Statistics and Planning
Statistics is indispensable to planning. In the modern age, which is termed as the

age of planning, in almost all over the world, governments (particularly of the
budding economies) are resorting to planning for economic development. The
success of planning must be based soundly on the correct analysis of complex
statistical data.
2. Statistics and Economics
Statistical data and statistical methods are of immense help in the proper
understanding of the economic problems and formulation of economic policies.
For e.g. What to produce, how to produce and for whom to produce – these are
the questions that need a lot of statistical data to arrive at correct decisions.
Statistical data and methods are the tools of an economist’s laboratory. Statistics
is the very foundation stone in the theory of exchange. How the national income
is to be calculated and how it is to be distributed cannot be answered without
statistics.
In recent years, “econometrics” which comprises of the application of statistical
methods to the theoretical economic methods is widely used in economic
research (Economics + Mathematics = Mathematical Economics)
Statistical methods help not only in framing economic policies but also in
evaluating their effect. As Alfred Marshall, the renowned economist observed
“Statistics is the straw out of which, I, like every other economist, have to make
bricks.
5
3. Statistics and Business
Statistics is an indispensable tool of production control also. Business
executives are relying more and more on statistical techniques for studying the
needs and desires of the consumers and for many other purposes. The success of
a business more or less depends upon the accuracy and precision of a statistical
forecast.
Suppose, a businessman wants to manufacture readymade garments, before
starting with the production process, he must have an overall idea as to “how
many garments have to be manufactured, how much raw materials and labour
are needed for that” and “what is the quantity, shape, colour, size etc.” Thus the
formulation of a production plan in advance is a must which cannot be done
without having quantitative facts about the details mentioned above. That is
why most of the industrial and commercial enterprises are employing trained
and efficient statisticians.
4. Statistics and Industry: In industry, statistics is very widely used in quality
control. In production engineering, to find out whether the product is
conforming to specification or not. Statistical tools like inspection, plans,
control In inspection plans, we have to resolve to some kinds-which are a very
important aspect of statistics.
5. Statistics and State
Since ancient times the ruling kings and chiefs have relied heavily on statistics
in framing suitable military and fiscal policies. Most of the statistics such as that
of crimes, military strength, population, taxes etc. that were collected by them
were a by product of administrative action. Concept of State has changed from
that of simply maintaining law and order to that of a welfare state. Statistical
data and methods are of great help in promoting human welfare. The state
collects statistics on several problems. These statistics helps in formulating
suitable policies. For E.g. The transport department cannot solve the problems
unless it knows how many buses are operating at present, what the total
requirement is, etc?
6. Statistics and natural science
Statistical techniques have proved to be extremely useful in the study of all

natural sciences like biology, zoology, medicine etc. In Diagnosing diseases,
doctor has to rely heavily on actual data like temperature of the body, pulse rate
etc. Similarly, in judging the efficacy of a particular drug for using against a
certain disease, experiments have to be conducted and the success and failure
would depend upon the number of people who are cured after using the drug.
7. Statistics is indispensable in research. Most of the advancements in

knowledge have taken place because of experiments conducted with the help of
statistical methods.
6
8. Statistics and other uses
Statistics are useful to bankers, brokers, insurance companies, social workers,
labour unions, politicians etc.
E.g. Politicians and their supporters are immensely interested in knowing their
prospects of winning an election. By sampling a few voters prior to the
election, the percentage of the votes the candidate will receive in the election
can be worked out. This estimated percentage could be used to decide whether a
greater candidate is required to assure success. Similarly premium rates of the
life insurance companies are based upon very careful study of expectancy of
life.
Statistical Survey
A survey is a process of collecting data from existing population units with no
particular control over the factors that may affect the population characteristics of
interests in the study; for e.g., in the study of salary of workers in a factory, the salary
may be affected by a number of factors such as educational level, nature of job etc. As
we get info about workers’ salary, we have no control over these factors – they happen
to be the existing attributes of the workers. A statistical survey may be either a general
purpose survey or a specific purpose survey (also known as special purpose survey). In
general purpose survey, we may obtain data which are useful for several purposes; for
e.g., population census. Such survey provides info not only about the total population,
but also about its divisions into males and females, literates and illiterates, employed
and unemployed etc. A special purpose survey is that in which the data obtained are
useful in analyzing a particular problem only.
A statistical survey passes through several stages before completion, starting from
planning the survey and ending with writing the final report. These stages can be
summarized under two broad heads: Planning the survey and executing the survey.
Planning the survey
Proper planning of survey is very much important because the quality of survey
research depends on the preparations made before the survey is conducted. The matters
which require careful considerations at the planning stage are:
1. Statement of the problem / Purpose of the survey:
Purpose of the survey should be clearly set out at the beginning. It will
necessitate a clear statement of the problem indicating what we are interested in
determining. The object of an enquiry may be either to collect specific
information relating to a particular problem or adequate data to verify a given
proposition or to test a hypothesis.
2. Scope of the survey:
Once the purpose of the survey has been clearly stated, the next step is to decide
about the scope of the survey, i.e., its coverage with regard to the type of
information, the subject matter, and the geographical area. For e.g.; an enquiry
relating to the socio-economic conditions of industrial workers may be
undertaken with the help of data relating to age, family details, income,
7
expenditure etc. Likewise, an enquiry may relate to India as a whole or a
particular state or an industrial town.
Three factors exert great influence on the scope of the survey:
a. Object of the enquiry
b. Availability of time
c. Availability of resources
The investigation should be carried out within a reasonable period of time;

otherwise the information collected may be outdated. For e.g., if a commission
is set up to recommend DA (Dearness Allowance) on the basis of the rise in
price, and the commission takes more than 3 years to submit the report, there is
every possibility of its findings being outdated.
3. Unit of data collection
Before collecting the data, the statistical unit must be clearly defined for the
purpose of investigation (Statistical unit is the unit in terms of which the
investigator selects the attributes for the enumeration, analysis and
interpretation); for e.g., in a population census, the statistical unit is a person.
However, the problem of defining the unit is not as simple as it appears to be.
For e.g., if we want to conduct a study, the size of a sugar mill, we’ve different
criteria to measure the size of sugar mill such as capital employed, number of
employed, total production etc. The investigator has to select one of these for
classification and then proceed to collect necessary info.
While fixing the statistical unit for an enquiry, it is useful to keep in view the
following points:
1. Units must suit the purpose of the study.
2. It should be simple to understand.
3. It should be specific.
4. It should be stable in character.
4. Sources of data collection
The sources of info may be either primary or secondary. Primary data is original
in character and it is also called first hand information. Whereas secondary data
is collected from published or unpublished sources. The data which is primary
in the hands of one person becomes secondary info for the other person. For
e.g.; if an investigator wants to collect some info regarding the smoking habits
of students in NUALS, if the investigator approaches directly the students and
collects info, such info constitutes primary data for the investigator. Suppose the
similar data already collected by the Students Council of NUALS and the
investigator approaches the Students Council members and collects the info,
such data constitutes secondary data for the investigator.
8
 Primary sources of collecting data: Questionnaires; Interview; Scheduled
methods; Observation; Correspondence.
5. Technique of Data collection
There are two techniques of data collection – census method and sample
method. A census is a complete enumeration of each and every unit of the
universe. In sample method, only a part of the universe is studied and the
conclusions about the entire universe are drawn on that basis. The choice
between the census and sample methods depends upon the availability of
resources, time factor, degree of accuracy desired and nature and scope of the
enquiry.
6. Frame
Frame refers to listing of all units in the population under study. For e.g.; if we
want to find out the number of workers in a small scale industry in Delhi, we
must’ve a complete list of names and addresses of all the small scale industries.
This list of names and addresses is called a frame. To a considerable extent, the
whole structure of enquiry is determined by the frame. Frames may be
inaccurate, incomplete, subject to duplication, inadequate and out of date. So, it
is therefore essential at the outset of the survey to carry out a careful
investigation of the frame.
7. Degree of Accuracy Desired
The investigator has to decide about the degree of accuracy that he wants to
attain. It may be pointed out that, absolute accuracy is not possible in a
statistical work because a). Statistics is based on estimates; b). Tools of
measurement are always not perfect; c). There may be unintentional bias on the
part of the investigator, enumerator or the informant.
8. Miscellaneous considerations:
Considerations should be given to various other matters such as whether the
enquiry is:
a. Official, semi-official or non-official;
b. Confidential or non-confidential;
c. Regular or ad hoc.
Executing the Survey

After a plan of data collection has been prepared, the next step is to execute the survey.
The various phases of the work subsequent to the planning stage are:
1. Setting up an administrative organization:
The administrative organisation required for an enquiry will depend very much
on the nature and scope of the enquiry. When the enquiry covers a large area,
supervision from a central office is likely to be difficult and in such cases, it is
best to establish regional offices.
9
2. Design of forms:
Careful attention should be given to the deciding of various forms that will be
used in the course of enquiry, especially the questionnaire.
3. Selection, Training and the Supervision of the Field Investigators:
In most of the surveys, the data are to be collected through enumerators who
work part-time or full-time basis. The success of the survey depends upon the
field investigators. So, it is essential that they are properly selected, thoroughly
trained and their work closely supervised.
The enumerators should be honest, intelligent, hard working and able to create
friendly atmosphere and make the respondent feel at ease. He must speak the
language of the respondent, ask the questions properly and intelligently and
record the response accurately and completely.
After having selected the enumerators, the next step is to give them proper
training. The enumerators should know the purpose of the survey, the manner in
which the data are to be collected and the interview should be conducted. They
should know the definitions of the terms used in the questionnaire or schedule;
for e.g.; the question nature of family - Nuclear family (Not exceeding 5);
Medium family (6-10 members) and joint family (More than 1 family living
together). It is also necessary to watch carefully the work of the enumerators.
The supervision should be carried by superior staff (Better paid, better qualified
and more experience).
4. Control over a quality of field work and field edit.
The field check should be carried out by the supervisors and it should be conducted in
such a manner that the investigators do not have prior knowledge of the work going to
be checked.
After the work of collecting data is completed, the questionnaire or schedule is handed
over to the enumerators by the supervisor. While in the field, the supervisor should
scrutinize these to check omissions, inconsistencies etc. This editing is highly useful
because (1). unless the questionnaires are edited on the spot, the need for further
information to correct some of the wrong entries made by the enumerators may only be
discovered when the enumerators have moved to another area.
(2). If the errors are discovered at this stage, the enumerators can be instructed not to
make such errors in the future.
5. Processing of Data
After the data have been collected, the efforts shift from the field to the office.
The data are to be given a thorough check, coded, transferred to cards and
tabulated. The process of coding involves translating responses in numerical
terms in order to facilitate the analysis. For e.g.:- the sex of the respondent may
be called as male 1, female 2. After the material is edited and coded, it is ready
for analysis which can be performed either by hand or machines.
10
6. Preparation of Report
After the data have been collected and analyzed, it is usually necessary to
embody the results of the survey in the form of a report. The preparation of
report therefore constitutes the final step in the execution. The following aspects
of the survey should be highlighted in the report.
a. Statement of the purpose of the survey
A general indication of the purpose of the survey should be given in the
report.
b. Description of the coverage
An exact description of the geographical region, the branch of economic and
social graphs covered by the survey should be given in the survey.
c. Collection of Information should be reported
The method of collecting data should be briefly explained and the copy of
questionnaire or schedule which is used for survey should be attached in the
final report.
d. Numerical Result
A general indication should be given about the methods followed in the
derivation of numerical results.
e. Miscellaneous Consideration
It is also important to touch upon such aspects like prior to which data refer,
time taken for the field survey, the reference of the available reports,
journals, publications etc.
Collection of Data
Data may be obtained from either primary source or secondary source. Primary
data means the data collected by an individual himself. Such data are original in
character. Whereas, secondary data is the data which are not originally collected but
rather obtained from published or unpublished sources. Data which are primary in the
hands of one becomes secondary in the hands of another. For e.g., suppose an
investigator wants to collect data about the smoking habits of students in NUALS is if
the investigator collects the data himself or through his agents adopting any suitable
methods, the data would constitute primary data for him. On the other hand, if the
student council has already made a similar survey and the investigator or his agent
obtains data from union office, such data would constitute secondary data for him.
Advantages of Secondary Data
1. It is highly convenient to use info which someone else has complied. There
is no need for printing data collection forms, appointing enumerators,
editing and tabulating the results.
11
2. Secondary data are much quicker to obtain than the primary data.
3. Secondary data may be available on some subjects where it would be
impossible to collect primary data.
The choice between primary and secondary data depends on:
1. Nature and scope of the enquiry
2. Availability of financial resources
3. Availability of time
4. Degree of accuracy desired
5. Collecting agency
Methods of Collecting Primary Data
1. Direct Personal Interview: Under this method, there is a face to face contact
with the persons from whom the info is to be obtained (informants). The
interviewer asks them questions pertaining to the survey and collects the desired
information.
Merits of Direct Personal Interview
a. Response is for encouraging because most of the people are willing to supply
info when approached personally.
b. The info obtained by this method is more accurate because the interviewer can
clear the doubts of the informants about certain questions.
c. It is also possible to collect supplementary info about the informant’s personal
characteristics and environment.
d. The questions about which the informant is likely to be sensitive can be
carefully sandwiched between other questions by the interviewer.
e. The language of communication can be adjusted to the status and educational
level of the person interviewed.
Limitations
a. It may be very costly where the number of persons to be interviewed is very
large and they are spread over a wide area.
b. The interviewer have to be thoroughly trained and supervised, otherwise they
may not be able to obtain the desired info.
c. More time is required for collecting info by this method as compared to other
methods because interviews can be held only at the convenience of the
informants.
12
Indirect Oral Interviews
Under this method, the investigator contacts 3rd parties (Known as witness)
capable of supplying the necessary information. This method is generally adopted in
those cases where the information to be obtained is complex in nature and the
informants are not willing to respond if approached directly.
The correctness of information obtained depends upon:
1. The type of person whose evidence is being recorded: If the people do not know
the full facts of the problem under investigation, it will not be possible to arrive
at correct conclusions.
2. The ability of the interviewers to draw out the info from the witness by means
of appropriate questions and cross-examinations.
3. The honesty of the interviewers who collect the info
Information through Correspondents
Under this method, the investigator appoints local agents or correspondents in
different places to collect information. These correspondents collect the information
and transmit it to the central office where the data are processed. Newspaper agencies
usually adopt this method.
Mail Questionnaire Method
Under this method, list of questions pertaining to the survey is prepared and
sent to various informants by post. Request is made to the informants through a
covering letter to fill up the questionnaire and sent it back within a specific time.
The main advantages are
 This method can be easily adopted where the field of investigation is very vast
and the informants are spread over a wide geographical area.
 On questions of personal nature, this method is generally superior to other
methods.
Major limitations are:
 this method can be adopted only where informants are literate
 It involves some uncertainty about the response. Cooperation on the part of

informants may be difficult to presume.
 The information supplied by the informants may not be correct and it may be
difficult to verify the accuracy.
The success of this method depends upon the skill with which the questionnaire is
drafted and the extent to which willing cooperation of the informants are secured. To
make this method work effectively, the following suggestions are made
 The questionnaire should be so framed that it doesn’t become an undue burden
on the respondents.
13
 Self addressed stamped envelop should be attached.
 The sample should be large.
 Attach gift coupen along with the questionnaire.
Schedule method
Under this method , the enumerators contact the informants, get replies to the questions
contained in the schedule and fill them in their own handwriting.
The essential difference between the mailed questionnaire method and schedule method
is that the questionnaire is sent to the informants by post and it is filled by the
informants himself. But in scheduled method, the enumerators carry the schedule
personally to the informants and enumerators fill the questionnaire/schedule.
Merits
It can be adopted in the case of illiterates.
Very little non response as the enumerators go personally to the field.
More reliable information.
Demerits
Compared to other methods, this method is very costly because the enumerators are
generally paid persons.
The success of this method depends upon the training imparted to the enumerators..
Census and Sampling
Under the census method or complete enumeration survey method, data are
collected from each and every unit of the population or universe. For e.g., if an
investigator wants to calculate the average wage of workers in a particular factory, he
should collect the data related to wages of each and every workers in the factory.
Merits of Census Method
1. Data are obtained from each and every unit of the population.
2. The results obtained are likely to be more representative, accurate and reliable.
3. It is an appropriate method of obtaining info on certain things like age, group of
workers, educational level etc.
Sampling
Sampling is simply the process of learning about the population on the basis of
the sample drawn from it. Thus, in the sampling technique, instead of every unit of the
universe, only a part of the universe is studied and the conclusions are drawn on that
basis for the entire universe.
For e.g., a housewife examines only 2 or 3 grains of boiling rice to know
whether the entire pot of rice is ready or not.
14
Essentials of sampling
For the sample results to have any meaning, it is necessary that a sample should
possess the following essentials:
1. Representativeness: A sample should be so selected that, it truly represents the
universe. To ensure representativeness, random method of selection should be used.
2. Adequacy: the size of the sample should be adequate; otherwise it may not
represent the characteristics of the universe.
3. Independence: all items of the sample should be selected independently of one
another; then only all items of the universe should have the same chance of being
selected in the sampling.
4. Homogeneity: here, it means that there is no basic difference in the nature of the
units of the universe and that of the sample.
Methods of sampling
Methods of Sampling
Probability Non-Probability
Simple Stratified Systematic Cluster Judgement Convenient quota
Various methods of sampling can be grouped under 2 broad heads:-

a) Probability sampling ( random sampling){ simple random, systematic, cluster
form, strict form}
b) Non-probability sampling (non random sampling){ judgment, convenient}
Probability sampling methods are those in which every item in the universe has a non
chance or probability of being chosen for the sampling. This implies that selection of
sample item is independent of the person making the study.
Non probability sampling methods are those in which do not provide every item in the
universe with a non chance of being included in the sampling.
Different Methods of Probability sampling are:
1. Simple Random Sampling
2. Stratified Random Sampling
3. Systematic Sampling
4. Cluster Sampling
15
Simple Random Sampling
Simple random sampling refers to that sampling technique in which each and
every unit of the population has an equal opportunity of being selected in the sample.
Two methods were used to select the sample:
1. Lottery method
2. Random Number Table method
Lottery Method: In lottery method, all items of the universe are numbered or named
on separate slips of paper having identical sides, shape, colour etc. These slips are then
folded and mixed up in a bowl. A blindfold selection is then made of the number of
slips required to substitute the desired sample size.
Merits:
1. Since the selection of items in the sample depends entirely on chance, there is
no possibility of personal bias affecting the results.
2. As size of the sample increases, it becomes increasingly representative of the
population.
Demerits:
1. The use of simple random sampling necessitates a completely catalogued
universe from which to draw the sample. But, it is often difficult for the
investigator to have up to date list of all the items of the population to be
sampled.
2. From the point of view of field survey, it has been claimed that cases selected
by random sampling tend to be too widely dispersed geographically and that the
time and cost of collecting data become too large.
Stratified Sampling
Under this method, the universe is subdivided into different groups (Strata) and a
sample is then chosen independently from each group by either lottery method or
random table method. Stratification is based on some common characteristics of the
data. For example, if we want to collect data regarding the consumption pattern of
people in India, the country is divided into different states. Again, states are divided
into different districts. Districts are then divided into zones. Zones are then divided into
Wards, etc. And from each part, a sample may be taken at random.
Next step is to select the sample size within each stratum. Usually proportionate
stratified sampling is used. It means that the number of items drawn from each stratum
is proportional to the size of the strata. The population is divided into three groups, say,
A, B, C and each group consist of 300, 600 and 900 people respectively. From these 3
groups, sample size 600 is to be selected.
Based on proportionate stratified sampling technique,
A=(300 x 600)/1800=100
B=(600 x 600)/1800=200
C=(900 x 600)/1800=300
16
From Group A, 100 samples, from Group B, 200 samples and from Group C, 300
samples are selected.
Merits
1. Since the population is first divided into various strata, then a sample has to be
drawn from each stratum, there is a little possibility of any essential group of
population being completely excluded. (More representativeness)
2. Each stratum is so framed that it consists of uniform or homogeneous items. So,
greater accuracy is there in the selection of samples. (Greater Accuracy)
3. As compared to random sample method, stratified samples have more
geographical concentration, i.e.; units from the different strata may be selected
in such a way that all of them are localised in one geographical area.
The main disadvantage of this method is that, if proper stratification of the
population is not done, the sample may have the effect of bias.
Systematic Sample
It is formed by selecting one unit at random and then selecting additional units
at evenly spaced intervals until the sample has been formed. This method is popularly
used in those cases where a complete list of the population from which the sample is to
be drawn is available.
The list may be prepared in alphabetical, geographical, numerical or some other
order. The items are serially numbered. The first item is selected at random by lottery
method. Subsequent items are selected by taking every kth item from the list. K refers to
sampling interval or sample ratio, i.e.; ratio of population size to the size of the sample.
k=N/n
Where k is sampling interval, N is the size of the universe and n is the sample size.
The merits of this method are that it is simple and convenient to adopt. Time and work
involved in sampling by this method are relatively less.
Cluster of Multi-Stage Sampling
Under this method, the random selection is made of primary, intermediate and final
units from a given population of stratum.
There are several stages in which the sampling process is carried out.
At first stage, units are sampled by some suitable methods such as simple random
sampling.
When a sample of second stage unit is selected from each of the selected first stage
units, again by some suitable method which may be same as or different from the
method employed for the first stage units. Further, stages may be added as required. For
example, suppose we want to take a sample of 500 households from the state of UP.
At the first stage, the state may be divided into number of districts and a few districts
selected at random.
17
At the second stage, each selected district may be subdivided into number of villages
and a sample of village may be taken at random.
At the third stage, a number of households may be selected from each of the villages
selected at the second stage.
The advantages of this method are:
1. It introduces flexibility in the sampling method.
2. Sub-division of the second stage units are carried out for only those first stage
units which are included in the sample.
Non-Probability Sampling
1. Judgement Sampling: In this method, the choice of the sample items
depends exclusively on the judgement of the investigator.
In other words, the investigator exercises this judgement in the choice and
includes those items in the sample which he thinks are most typical of the
universe with regard to the characteristics under investigation.
e.g.; If a sample of ten students is to be from a class of 60 for analyzing the
spending habits of the students, the investigator would select 10 students who in his
opinion are representative of the class.
Limitations
This method is not scientific because the population units to be sampled may be
effected by personal prejudice or bias of the investigator. For example, if an
investigator holds the view that the wages of workers in a certain establishment are
very low and if he adopts judgement sampling method, he may include only those
workers whose wages are low and thereby establish his point of view which may be far
from the truth.
Convenient Sampling
A convenient sample is obtained by selecting convenient population units. The method
of convenient sampling is also called ‘chunk’. A chunk refers to that fraction of the
population being investigated which is selected neither by probability nor by judgement
sampling, but by convenience.
The sample obtained from readily available list like telephone directory is a convenient
sample. For example, if a person is to submit a project report on labour management
relation in textile industry and he takes a textile mill used to his office and interviews
some people over there, he is following the convenient sampling method.
Convenient sampling is often used for making piolet study or pre-testing the
questionnaire.
Quota Sampling
It is a type of judgement sampling and commonly used sampling technique in non-
probability category. In a quota sample, quotas are set up according to some specified
characteristics. Each interviewer is then asked to interview a certain number of persons
which constitute his quota. Within the quota, the selection of sample items depends on
18
personal judgement. For example, in a radio listening survey, the interviewer may be
asked to interview people living in a certain area. Quotas may consist of housewives,
farmers, children etc. Within theses quotas, interviewer is free to select the sample.
Quota sampling and stratified sampling are almost similar. In both methods, the
universe is divided into different parts and the sample is selected from each part. The
only difference is that in stratified random sampling, the sample within each stratum is
selected at random. But in quota sampling, the sample within the quotas is not selected
at random.
Merits of Sampling
1. Less time consuming: Since the sample is a study of a part of the population,
considerable time and labour are saved when a sample survey is carried out.
Time is saved not only in collecting data, but also in processing of it.
2. Less Cost: The total financial burden of a sample survey is generally less than
that of complete enumeration. This is because of the fact that in sampling, we
study only a part of the population and the total expense of collecting data is
less than that required in census method.
3. More Detailed Info: Since sampling techniques save time and labour, it is
possible to collect more detailed info in sample survey.
4. Sampling method is the only method that can be used in certain cases. For
example, if an investigator interested in testing the breaking strength of chalks
manufactured in a factory, under census method, all the chalks would be broken
in the process of testing.
Limitations of Sampling
1. A sample survey must be carefully planned and executed. Otherwise, the results
obtained may be inaccurate and misleading.
2. Sometimes the sampling plan may be more complicated than it requires more
time, labour and money than a complete count. This is because the size of the
sample is a large proportion of the total population.
Classification and Tabulation of Data
Classification of Data
The process of arranging things in groups or classes according to their common
characteristics is called classification of data. According to Secrist, “Classification is
the process of arranging data into sequences and groups according to their common
characteristics or separating them into different but related parts.
Requisites of a Good Classification
The main characteristics of a good classification are:
1. It should be exhaustive: Classification must be exhaustive in the sense that
each and every item in the data must belong to one of the classes.
19
2. It should be unambiguous: Classification is meant for removing ambiguity. It
is necessary that various classes should be so defined that there is no room for
doubt or confusion.
3. It should be mutually exclusive: Each item of the given data should fit only in
one class. In other words, the classes must not overlap.
4. It should be homogeneous: The items included in each class must be
homogeneous. Otherwise, there may be further classification into sub groups.
Purpose of Classification of data
1. It condenses the mass of data and ignores the unnecessary details, thereby
making available input data to study or survey.
2. It facilitates comparison between data.
3. It helps in studying the relationship between several characteristics.
4. It facilitates further statistical treatments.
5. It helps in preparing the data for tabulation.
6. It presents facts in a simple form.
7. It brings out clearly the points of similarity and dissimilarity.
Types of Classification
1. Quantitative Classification: When the basis of classification is according to
differences in quantity, the classification is called quantitative classification. In
other words, quantitative classification is made according to numerical size. A
quantitative classification is the classification which is based on such
characteristics which are capable of quantitative measurement such as height,
weight, marks obtained etc of individuals. Here, height, weight etc is a variable
and the number of persons indicates frequency.
2. Temporal Classification / Chronological Classification: When the basis of
classification is according to differences in time, the classification is called
temporal or chronological classification. For e.g., the students who got first
division during the last three years are classified year wise.
3. Spatial / Geographical Classification: When the basis of classification is
according to geographical location or place, such classification is called spatial
or geographical classification. For e.g., the crime rate in different states.
4. Qualitative Classification: When the basis of classification is according to
characteristics or attributes like social status etc, it is called qualitative
classification. For e.g., educated and uneducated persons, married and
unmarried persons.
20
Classification of this nature is of two types:
1. Simple classification
2. Manifold classification
If the data are classified only into two categories according to the presence or absence
of only one attribute, such type of classification is known as simple or twofold or
dichotomous classification. For e.g., the population of India maybe divided into males
and females. Manifold classification is a classification where more than two attributes
are involved. For e.g., when the population of males and females are further subdivided
into literates and illiterates, we find there two attributes under the study.
Tabulation of Data
The last stage in the compilation of data is tabulation. After the data have been
collected and classified, it is essential to put them in the form of tables. Tabulation is a
scientific process used in setting of the collected data in an understandable form.
According to Prof. Cuttle, “the logical listing of related quantitative data in vertical
columns and horizontal rows of numbers with sufficient explanatory and qualifying
words, phrases and statements in the form of title, headings and explanatory notes to
make clear the full meaning, context and origin of the data.”
Objectives of Tabulation
1. To simplify the complex data: In the process of tabulation, the unnecessary
details are avoided. All tabular data are presented in such a manner that they
become more meaningful and can be easily understood by a common man.
2. To clarify the objective of investigation: The purpose of tabulation is to
arrange the data in easily assessable form, the answers with which the
investigation is concerned.
3. To facilitate comparison: It facilitates comparison of data shown in rows and
columns. Sometimes, comparable figures are placed in columns or rows.
4. To depict trend and pattern of data: Tabulation of data shows the trend of
info under the study. It reveals the patterns within the figures which cannot be
understood in a descriptive form of presentation.
5. To help reference for future studies: Data arranged in tables with titles and
table numbers can be easily identified and made use of as source reference for
future use and studies.
6. To facilitate statistical analysis: It is only after classification and tabulation
that the statistical data becomes fit for analysis and interpretation. Various
statistical measures such as averages, dispersion, correlation etc can be
calculated from the data which is systematically classified and tabulated.
Difference between Classification and Tabulation
The basic points of difference between classification and tabulation, besides
these are closely related, are as given below:
21
1. Classification of data is a process of statistical analysis while tabulation is a
process of presentation.
2. Classification is the basis for tabulation because the data is classified first
and then tabulated.
3. In classification, the data is divided into various groups and sub-groups
based on their similarities and dissimilarities, while tabulation is a process
of arranging the classified data in rows and columns with suitable heads and
sub-heads.
Essential Parts of a Statistical Table
1. Table Number: A table should be numbered for identification, especially,
when there are a large number of tables in a study. The number may be put at
the centre above the title.
2. Title of the Table: Every table should have a title. It should be clear, brief and
self-explanatory. The title should be set in bold type so as to give it prominence.
3. Stub / Row Heading: Each row of the table must have a heading. The headings
of the rows are called stubs. Stubs clarify the figures in the rows. As far as
possible, the items should be condensed so that they can be included in a single
row.
4. Caption / Column Heading: A table has many columns and the sub-headings
of the columns are called captions or column headings. They should be well-
defined and brief.
5. Body of the Table: It is the most vital part of a table. It contains numerical
values. It should be made as comprehensive as possible. The actual data should
be arranged in such a manner that any figure maybe readily located.
6. Unit of Measurement: The unit of measurement should be stated along with
the title, if this is uniform throughout. If different units have been adopted, then
they should be stated along the stub or caption.
7. Source Notes: A note at the bottom of the table should always be given to
indicate the primary source as well as the secondary source from where the data
has been taken, particularly when there is more than one source.
8. Footnotes and References: It is always placed at the bottom of the table. It is a
statement containing explanation of some specific items which cannot be
understood by the reader from the title, captions and stubs.
Measures of central tendency
A single expression, representing the whole group, is selected which

may convey a fairly adequate idea about the whole group. This single
expression in statistics is known as the average. Five types of measures
of central tendency or averages which are commonly used. These are
Arithmetic Mean
22
Median
Mode
Harmonic Mean
Geometric Mean
Features of good average
According to Prof. G.U. Yule, a good average must have the following
characteristics
1. It should be rigidly defined so that the different persons may not

interpret it differently.
2. It should be easy to understand and easy to calculate.
3. It should be based on all the observations of the date.
4. It should be easily subjected to further mathematical
calculations.
5. It should not be unduly affected by the extreme values.
6. It should be easy to interpret.
7. It should have sampling stability. It means that if the average is
computed for similar groups, the result should also be similar.
Arithmetic Mean
Merits
1. It can be easily calculated.

2. Its calculations are based on all the observations.
3. It is easy to understand.
4. It is rigidly defined by the mathematical formula.
5. It is least affected by sampling fluctuations.
6. It is the best measure to compare two more series.
7. It is the average obtained by calculations and it does not
depend upon any position.
De-merits
1. It may not be represented in actual data and so it is

theoretical.
2. The extreme values have greater effect on mean
3. It cannot be calculated if all the values are not known.
4. It cannot be determined for the qualitative data.
Uses
1. A common man uses mean for calculating average marks
obtained by a student.
2. Estimates are always obtained by mean.
23
3. Businessman uses it to find out the operation cost, profit,
average monthly income etc.
Median
Median is defined as the middle most or the central value of the

variable in a set of observations, when the observations are arranged in
the ascending or descending order of their magnitude.
Merits
1. It is easily understood.
2. It is not affected by extreme items.
3. It can be located graphically.
4. It is the best measure for qualitative data.
5. It can be easily located even if class intervals in the series are
unequal.
6. It can be determined even by inspection.
De-merits
1. It is not subject to algebraic treatment.

2. It cannot represent the irregular distribution series
3. It is a positional average and is based on the middle item.
4. It is an estimate in case of a series containing even number of
items.
5. It does not take into account the values of all the items in the
series.
Uses
1. It is useful in those cases where numerical measurements are

not possible.
2. It is also useful in those cases where mathematical calculations
cannot be made in order to obtain the mean.
3. It is generally used in studying the phenomenon like skill,
honesty etc.
Mode
Mode is that value in a series which occurs most frequently. The series
of observations which contains only one mode is called uni modal
series. The series of observations which contains two modes is called a
bimodal series. If a series of observations has more than one mode then
the mode is said to be ill defined.
Merits
24
1. It can be easily understood.
2. It can be located in some cases by inspection.
3. It is capable of being ascertained graphically.
4. It is not affected by extreme values.
5. It represents the most frequent value and hence it is very often in
practice.
6. The arrangement of data is not necessary if the items are a few.
Demerits
1. There are different formulae for its calculation.

2. More is determinate. Some series have two or more than two
modes.
3. It cannot be subjected to algebraic treatment. i.e combined mode
cannot be calculated for the modes of two series.
4. Mode for the series with unequal class intervals cannot be
calculated.
Harmonic Mean
Harmonic mean is the reciprocal of the arithmetic mean of the same

data
Merits
1. Easy to calculate.
2. It is rigidly defined.
3. It gives largest weight to the smallest items.
4. It is a useful average when we deal with the average of rates.
Demerits
It cannot be located by inspection
MEASURES OF DISPERSION
The meaning of dispersion is scatterdness. It helps in finding out the

variability of the data or scatterdness of individual items in a given
distribution. In other words, the numerical data tend to spread about
an average value is called the dispersion of the data. The term used in
two senses. The first sense relates to the limits within which the data
fall and the second takes into account the amount, absolute or relative,
by which the values of the items differ from an average.
RANGE
25
It is the difference between the minimum and maximum items of the
series.
Merits
1. It can be easily understood

2. It is easy to calculate and it is the simplest method of measuring
dispersion.
3. It tends itself to algebraic treatments.
4. It is an absolute measure of dispersion.
Demerits
1. It is too indefinite to be used as a practical measure of dispersion

because it depends entirely upon the extreme values.
2. It is not based on all the items
3. It is affected by sampling fluctuations.
Quartile deviation
Quartile deviation is a measure of dispersion based on the upper

quartile and lower quartile of a series. It is the half of the difference
between the upper quartile and the lower quartile.
Merits
1. It is easy to calculate
2. It can be easily understood.
3. It is not affected by extreme values.
4. It has a special utility in measuring variation in case of frequency
distribution with open end classes at both ends.
Demerits
1. It is not based on all the observations.

2. It is not representative value of the data.
3. It is not capable of further algebraic calculations.
4. It cannot be regarded as a measure of dispersion as it really does
not show the scatteredness around an average but rather a
distance on a scale.
Mean Deviation
Mean deviation is a set of observations of a series is the arithmetic

mean of all the deviations, without their algebraic signs, taken from
its central value.
26
Merits
1. It is easy to understood and compute.

2. Mean deviation is less affected by the extreme values as
compared to standard deviation.
Demerits
1. In mean deviation the signs of all deviations are taken as positive
and therefore, it is not suitable for further algebraic treatment.
2. It is rarely used in social sciences.
3. It does not give accurate results because the mean deviation from
the median is least but median itself is not considered a
satisfactory average when the variation in the series is large.
Standard deviation
Standard deviation is the positive square root of the average of squared

deviations taken from the arithmetic mean.
Merits
1. It is based on all the observations.

2. It is rigidly defined.
3. It has a great mathematical significance and is capable of further
mathematical treatment.
4. It represents the true measurement if dispersion of a series.
5. It is reliable and dependable measure of dispersion
6. It is useful in correlation analysis.
Demerits
1. It is difficult to compute unlike other measures of dispersion

2. It is not simple to understand.
3. It gives more weightage to extreme values
4. It consumes more time and labour while computing it.
27

Descriptive Statistics Notes

Uploaded by

Copyright:

Available Formats

Descriptive Statistics Notes

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Descriptive Statistics Notes

Uploaded by

Copyright:

Available Formats

Origin and Development of Statistics

Importance of Statistics in Different fields

Statistics is indispensable to planning. In the modern age, which is termed as the

2. Statistics and Economics

5. Statistics and State

6. Statistics and natural science

Statistical techniques have proved to be extremely useful in the study of all

7. Statistics is indispensable in research. Most of the advancements in

a. Object of the enquiry

The investigation should be carried out within a reasonable period of time;

Executing the Survey

 this method can be adopted only where informants are literate

 It involves some uncertainty about the response. Cooperation on the part of

Simple Stratified Systematic Cluster Judgement Convenient quota

Various methods of sampling can be grouped under 2 broad heads:-

Measures of central tendency

A single expression, representing the whole group, is selected which

Features of good average

1. It should be rigidly defined so that the different persons may not

1. It can be easily calculated.

1. It may not be represented in actual data and so it is

Median is defined as the middle most or the central value of the

1. It is not subject to algebraic treatment.

1. It is useful in those cases where numerical measurements are

1. There are different formulae for its calculation.

Harmonic mean is the reciprocal of the arithmetic mean of the same

It cannot be located by inspection

The meaning of dispersion is scatterdness. It helps in finding out the

1. It can be easily understood

1. It is too indefinite to be used as a practical measure of dispersion

Quartile deviation is a measure of dispersion based on the upper

1. It is not based on all the observations.

Mean deviation is a set of observations of a series is the arithmetic

1. It is easy to understood and compute.

Standard deviation is the positive square root of the average of squared

1. It is based on all the observations.

1. It is difficult to compute unlike other measures of dispersion

You might also like