Methods of Data Collection Collection of Primary Data

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Methods of Data Collection

COLLECTION OF PRIMARY DATA

We collect primary data during the course of doing experiments in an experimental


research but in case we do research of the descriptive type and perform surveys,
whether sample surveys or census surveys, then we can obtain primary data either
through observation or through direct communication with respondents in one form or
another or through personal interviews. This, in other words, means that there are
several methods of collecting primary data, particularly in surveys and descriptive
researches. Important ones are:

i. observation method,
ii. interview method,
iii. through questionnaires,
iv. through schedules, and
v. other methods which include

a. warranty cards;
b. distributor audits;
c. pantry audits;
d. consumer panels;
e. using mechanical devices;
f. through projective techniques;
g. depth interviews, and
h. content analysis.

We briefly take up each method separately.


OBSERVATION METHOD

The observation method is the most commonly used method specially in studies
relating to behavioral sciences. In a way we all observe things around us, but this sort
of observation is not scientific observation. Observation becomes a scientific tool and
the method of data collection for the researcher, when it serves a formulated research
purpose, is systematically planned and recorded and is subjected to checks and
controls on validity and reliability. Under the observation method, the information is
sought by way of investigator’s own direct observation without asking from the
respondent. For instance, in a study relating to consumer behavior, the investigator
instead of asking the brand of wrist watch used by the respondent, may himself look at
the watch. The main advantage of this method is that subjective bias is eliminated, if
observation is done accurately. Secondly, the information obtained under this method
relates to what is currently happening; it is not complicated by either the past behavior
or future intentions or attitudes. Thirdly, this method is independent of respondents’
willingness to respond and as such is relatively less demanding of active cooperation
on the part of respondents as happens to be the case in the interview or the
questionnaire method. This method is particularly suitable in studies which deal with
subjects (i.e., respondents) who are not capable of giving verbal reports of their
feelings for one reason or the other However, observation method has various
limitations. Firstly, it is an expensive method. Secondly, the information provided by
this method is very limited. Thirdly, sometimes unforeseen factors may interfere with
the observational task. At times, the fact that some people are rarely accessible to
direct observation creates obstacle for this method to collect data effectively.
While using this method, the researcher should keep in mind things like: What should
be observed? How the observations should be recorded? Or how the accuracy of
observation can be ensured? In case the observation is characterized by a careful
definition of the units to be observed, the style of recording the observed information,
standardized conditions of observation and the selection of pertinent data of
observation, then the observation is called as structured observation. But when
observation is to take place without these characteristics to be thought of in advance,
the same is termed as unstructured observation. Structured observation is considered
appropriate in descriptive studies, whereas in an exploratory study the observational
procedure is most likely to be relatively unstructured.
We often talk about participant and non-participant types of observation in the context
of studies, particularly of social sciences. This distinction depends upon the observer’s
sharing or not sharing the life of the group he is observing. If the observer observes by
making himself, more or less, a member of the group he is observing so that he can
experience what the members of the group experience, the observation is called as the
participant observation. But when the observer observes as a detached emissary
without any attempt on his part to experience through participation what others feel,
the observation of this type is often termed as non-participant observation. (When the
observer is observing in such a manner that his presence may be unknown to the
people he is observing, such an observation is described as disguised observation.)
There are several merits of the participant type of observation: The researcher is
enabled to record the natural behavior of the group. The researcher can even gather
information which could not easily be obtained if he observes in a disinterested
fashion. The researcher can even verify the truth of statements made by informants in
the context of a questionnaire or a schedule. But there are also certain demerits of this
type of observation viz., the observer may lose the objectivity to the extent he
participates emotionally; the problem of observation-control is not solved; and it may
narrow-down the researcher’s range of experience.
Sometimes we talk of controlled and uncontrolled observation. If the observation
takes place in the natural setting, it may be termed as uncontrolled observation, but
when observation takes place according to definite pre-arranged plans, involving
experimental procedure, the same is then termed controlled observation. In non-
controlled observation, no attempt is made to use precision instruments. The major
aim of this type of observation is to get a spontaneous picture of life and persons. It
has a tendency to supply naturalness and completeness of behavior, allowing
sufficient time for observing it. But in controlled observation, we use mechanical (or
precision) instruments as aids to accuracy and standardization. Such observation has a
tendency to supply formalized data upon which generalizations can be built with some
degree of assurance. The main pitfall of non-controlled observation is that of
subjective interpretation. There is also the danger of having the feeling that we know
more about the observed phenomena than we actually do. Generally, controlled
observation takes place in various experiments that are carried out in a laboratory or
under controlled conditions, whereas uncontrolled observation is resorted to in case of
exploratory researches.

INTERVIEW METHOD

The interview method of collecting data involves presentation of oral-verbal stimuli


and reply in terms of oral-verbal responses. This method can be used through personal
interviews and, if possible, through telephone interviews.

Personal interviews: Personal interview method requires a person known as the


interviewer asking questions generally in a face-to-face contact to the other person or
persons. (At times the interviewee may also ask certain questions and the interviewer
responds to these, but usually the interviewer initiates the interview and collects the
information.) This sort of interview may be in the form of direct personal
investigation or it may be indirect oral investigation. In the case of direct personal
investigation the interviewer has to collect the information personally from the
sources concerned. He has to be on the spot and has to meet people from whom data
have to be collected. This method is particularly suitable for intensive investigations.
But in certain cases it may not be possible or worthwhile to contact directly the
persons concerned or on account of the extensive scope of enquiry, the direct personal
investigation technique may not be used. In such cases an indirect oral examination
can be conducted under which the interviewer has to cross-examine other persons who
are supposed to have knowledge about the problem under investigation and the
information, obtained is recorded. Most of the commissions and committees appointed
by government to carry on investigations make use of this method.
The method of collecting information through personal interviews is usually carried
out in a structured way. As such we call the interviews as structured interviews. Such
interviews involve the use of a set of predetermined questions and of highly
standardized techniques of recording. Thus, the interviewer in a structured interview
follows a rigid procedure laid down, asking questions in a form and order prescribed.
As against it, the unstructured interviews are characterized by a flexibility of approach
to questioning. Unstructured interviews do not follow a system of pre-determined
questions and standardized techniques of recording information. In a non-structured
interview, the interviewer is allowed much greater freedom to ask, in case of need,
supplementary questions or at times he may omit certain questions if the situation so
requires. He may even change the sequence of questions. He has relatively greater
freedom while recording the responses to include some aspects and exclude others.
But this sort of flexibility results in lack of comparability of one interview with
another and the analysis of unstructured responses becomes much more difficult and
time-consuming than that of the structured responses obtained in case of structured
interviews. Unstructured interviews also demand deep knowledge and greater skill on
the part of the interviewer. Unstructured interview, however, happens to be the central
technique of collecting information in case of exploratory or formulative research
studies. But in case of descriptive studies, we quite often use the technique of
structured interview because of its being more economical, providing a safe basis for
generalization and requiring relatively lesser skill on the part of the interviewer.
We may as well talk about focused interview, clinical interview and the non-directive
interview. Focused interview is meant to focus attention on the given experience of
the respondent and its effects. Under it the interviewer has the freedom to decide the
manner and sequence in which the questions would be asked and has also the freedom
to explore reasons and motives. The main task of the interviewer in case of a focused
interview is to confine the respondent to a discussion of issues with which he seeks
conversance. Such interviews are used generally in the development of hypotheses
and constitute a major type of unstructured interviews. The clinical interview is
concerned with broad underlying feelings or motivations or with the course of
individual’s life experience. The method of eliciting information under it is generally
left to the interviewer’s discretion. In case of non-directive interview, the
interviewer’s function is simply to encourage the respondent to talk about the given
topic with a bare minimum of direct questioning. The interviewer often acts as a
catalyst to a comprehensive expression of the respondents’ feelings and beliefs and of
the frame of reference within which such feelings and beliefs take on personal
significance.
Despite the variations in interview-techniques, the major advantages and weaknesses
of personal interviews can be enumerated in a general way. The chief merits of the
interview method are as follows:

1. More information and that too in greater depth can be obtained.

2. Interviewer by his own skill can overcome the resistance, if any, of the
respondents; the interview method can be made to yield an almost perfect
sample of the general population.

3. There is greater flexibility under this method as the opportunity to restructure


questions is always there, especially in case of unstructured interviews.

4. Observation method can as well be applied to recording verbal answers to


various questions.

5. Personal information can as well be obtained easily under this method.

6. Samples can be controlled more effectively as there arises no difficulty of the


missing returns; non-response generally remains very low.

7. The interviewer can usually control which person(s) will answer the questions.
This is not possible in mailed questionnaire approach. If so desired, group
discussions may also be held.

8. The interviewer may catch the informant off-guard and thus may secure the
most spontaneous reactions than would be the case if mailed questionnaire is
used.
9. The language of the interview can be adopted to the ability or educational level
of the person interviewed and as such misinterpretations concerning questions
can be avoided.

10.The interviewer can collect supplementary information about the respondent’s


personal characteristics and environment which is often of great value in
interpreting results.

But there are also certain weaknesses of the interview method. Among the important
weaknesses, mention may be made of the following:

1. It is a very expensive method, especially when large and widely spread


geographical sample is taken.

2. There remains the possibility of the bias of interviewer as well as that of the
respondent; there also remains the headache of supervision and control of
interviewers.

3. Certain types of respondents such as important officials or executives or people


in high income groups may not be easily approachable under this method and
to that extent the data may prove inadequate.

4. This method is relatively more-time-consuming, especially when the sample is


large and recalls upon the respondents are necessary.

5. The presence of the interviewer on the spot may over-stimulate the respondent,
sometimes even to the extent that he may give imaginary information just to
make the interview interesting.

6. Under the interview method the organization required for selecting, training
and supervising the field-staff is more complex with formidable problems.

7. Interviewing at times may also introduce systematic errors.

8. Effective interview presupposes proper rapport with respondents that would


facilitate free and frank responses. This is often a very difficult requirement.

Pre-requisites and basic tenets of interviewing: For successful implementation of the


interview method, interviewers should be carefully selected, trained and briefed. They
should be honest, sincere, hardworking, impartial and must possess the technical
competence and necessary practical experience. Occasional field checks should be
made to ensure that interviewers are neither cheating, nor deviating from instructions
given to them for performing their job efficiently. In addition, some provision should
also be made in advance so that appropriate action may be taken if some of the
selected respondents refuse to cooperate or are not available when an interviewer calls
upon them.
In fact, interviewing is an art governed by certain scientific principles. Every effort
should be made to create friendly atmosphere of trust and confidence, so that
respondents may feel at ease while talking to and discussing with the interviewer. The
interviewer must ask questions properly and intelligently and must record the
responses accurately and completely. At the same time, the interviewer must answer
legitimate question(s), if any, asked by the respondent and must clear any doubt that
the latter has. The interviewers approach must be friendly, courteous, conversational
and unbiased. The interviewer should not show surprise or disapproval of a
respondent’s answer but he must keep the direction of interview in his own hand,
discouraging irrelevant conversation and must make all possible effort to keep the
respondent on the track.

Telephone interviews: This method of collecting information consists in contacting


respondents on telephone itself. It is not a very widely used method, but plays
important part in industrial surveys, particularly in developed regions. The chief
merits of such a system are:

1. It is more flexible in comparison to mailing method.

2. It is faster than other methods i.e., a quick way of obtaining information.

3. It is cheaper than personal interviewing method; here the cost per response is
relatively low.

4. Recall is easy; callbacks are simple and economical.

5. There is a higher rate of response than what we have in mailing method; the
non-response is generally very low.

6. Replies can be recorded without causing embarrassment to respondents.

7. Interviewer can explain requirements more easily.

8. At times, access can be gained to respondents who otherwise cannot be


contacted for one reason or the other.
9. No field staff is required.

10.Representative and wider distribution of sample is possible.

But this system of collecting information is not free from demerits. Some of these
may be highlighted.

1. Little time is given to respondents for considered answers; interview period is


not likely to exceed five minutes in most cases.

2. Surveys are restricted to respondents who have telephone facilities.

3. Extensive geographical coverage may get restricted by cost considerations.

4. It is not suitable for intensive surveys where comprehensive answers are


required to various questions.

5. Possibility of the bias of the interviewer is relatively more.

6. Questions have to be short and to the point; probes are difficult to handle.

COLLECTION OF DATA THROUGH QUESTIONNAIRES

This method of data collection is quite popular, particularly in case of big enquiries. It
is being adopted by private individuals, research workers, private and public
organizations and even by governments. In this method a questionnaire is sent
(usually by post) to the persons concerned with a request to answer the questions and
return the questionnaire. A questionnaire consists of a number of questions printed or
typed in a definite order on a form or set of forms. The questionnaire is mailed to
respondents who are expected to read and understand the questions and write down
the reply in the space meant for the purpose in the questionnaire itself. The
respondents have to answer the questions on their own.
The method of collecting data by mailing the questionnaires to respondents is most
extensively employed in various economic and business surveys. The merits claimed
on behalf of this method are as follows:

1. There is low cost even when the universe is large and is widely spread
geographically.

2. It is free from the bias of the interviewer; answers are in respondents’ own
words.
3. Respondents have adequate time to give well thought out answers.

4. Respondents, who are not easily approachable, can also be reached


conveniently.

5. Large samples can be made use of and thus the results can be made more
dependable and reliable.

The main demerits of this system can also be listed here:

1. Low rate of return of the duly filled in questionnaires; bias due to no-response
is often indeterminate.

2. It can be used only when respondents are educated and cooperating.

3. The control over questionnaire may be lost once it is sent.

4. There is inbuilt inflexibility because of the difficulty of amending the approach


once questionnaires have been despatched.

5. There is also the possibility of ambiguous replies or omission of replies


altogether to certain questions; interpretation of omissions is difficult.

6. It is difficult to know whether willing respondents are truly representative.

7. This method is likely to be the slowest of all.

Before using this method, it is always advisable to conduct ‘pilot study’ (Pilot Survey)
for testing the questionnaires. In a big enquiry the significance of pilot survey is felt
very much. Pilot survey is in fact the replica and rehearsal of the main survey. Such a
survey, being conducted by experts, brings to the light the weaknesses (if any) of the
questionnaires and also of the survey techniques. From the experience gained in this
way, improvement can be effected.
Main aspects of a questionnaire: Quite often questionnaire is considered as the heart
of a survey operation. Hence it should be very carefully constructed. If it is not
properly set up, then the survey is bound to fail. This fact requires us to study the
main aspects of a questionnaire viz., the general form, question sequence and question
formulation and wording. Researcher should note the following with regard to these
three main aspects of a questionnaire:
General form: So far as the general form of a questionnaire is concerned, it can either
be structured or unstructured questionnaire. Structured questionnaires are those
questionnaires in which there are definite, concrete and pre-determined questions. The
questions are presented with exactly the same wording and in the same order to all
respondents. Resort is taken to this sort of standardization to ensure that all
respondents reply to the same set of questions. The form of the question may be either
closed (i.e., of the type ‘yes’ or ‘no’) or open (i.e., inviting free response) but should
be stated in advance and not constructed during questioning. Structured questionnaires
may also have fixed alternative questions in which responses of the informants are
limited to the stated alternatives. Thus a highly structured questionnaire is one in
which all questions and answers are specified and comments in the respondent’s own
words are held to the minimum. When these characteristics are not present in a
questionnaire, it can be termed as unstructured or non-structured questionnaire. More
specifically, we can say that in an unstructured questionnaire, the interviewer is
provided with a general guide on the type of information to be obtained, but the exact
question formulation is largely his own responsibility and the replies are to be taken
down in the respondent’s own words to the extent possible; in some situations tape
recorders may be used to achieve this goal.
Structured questionnaires are simple to administer and relatively inexpensive to
analyze. The provision of alternative replies, at times, helps to understand the
meaning of the question clearly. But such questionnaires have limitations too. For
instance, wide range of data and that too in respondent’s own words cannot be
obtained with structured questionnaires. They are usually considered inappropriate in
investigations where the aim happens to be to probe for attitudes and reasons for
certain actions or feelings. They are equally not suitable when a problem is being first
explored and working hypotheses sought. In such situations, unstructured
questionnaires may be used effectively. Then on the basis of the results obtained in
pretest (testing before final use) operations from the use of unstructured
questionnaires, one can construct a structured questionnaire for use in the main study.

Question sequence: In order to make the questionnaire effective and to ensure quality
to the replies received, a researcher should pay attention to the question-sequence in
preparing the questionnaire. A proper sequence of questions reduces considerably the
chances of individual questions being misunderstood. The question-sequence must be
clear and smoothly-moving, meaning thereby that the relation of one question to
another should be readily apparent to the respondent, with questions that are easiest to
answer being put in the beginning. The first few questions are particularly important
because they are likely to influence the attitude of the respondent and in seeking his
desired cooperation. The opening questions should be such as to arouse human
interest. The following type of questions should generally be avoided as opening
questions in a questionnaire:

1. questions that put too great a strain on the memory or intellect of the
respondent;

2. questions of a personal character;

3. questions related to personal wealth, etc.

Following the opening questions, we should have questions that are really vital to the
research problem and a connecting thread should run through successive questions.
Ideally, the questionsequence should conform to the respondent’s way of thinking.
Knowing what information is desired, the researcher can rearrange the order of the
questions (this is possible in case of unstructured questionnaire) to fit the discussion in
each particular case. But in a structured questionnaire the best that can be done is to
determine the question-sequence with the help of a Pilot Survey which is likely to
produce good rapport with most respondents. Relatively difficult questions must be
relegated towards the end so that even if the respondent decides not to answer such
questions, considerable information would have already been obtained. Thus,
question-sequence should usually go from the general to the more specific and the
researcher must always remember that the answer to a given question is a function not
only of the question itself, but of all previous questions as well. For instance, if one
question deals with the price usually paid for coffee and the next with reason for
preferring that particular brand, the answer to this latter question may be couched
largely in terms of price differences.

Question formulation and wording: With regard to this aspect of questionnaire, the
researcher should note that each question must be very clear for any sort of
misunderstanding can do irreparable harm to a survey. Question should also be
impartial in order not to give a biased picture of the true state of affairs. Questions
should be constructed with a view to their forming a logical part of a well thought out
tabulation plan. In general, all questions should meet the following standards—

a. should be easily understood;


b. should be simple i.e., should convey only one thought at a time;

c. should be concrete and should conform as much as possible to the respondent’s


way of thinking.

For instance, instead of asking. “How many razor blades do you use annually?” The
more realistic question would be to ask, “How many razor blades did you use last
week?”
Concerning the form of questions, we can talk about two principal forms, viz.,
multiple choice question and the open-end question. In the former the respondent
selects one of the alternative possible answers put to him, whereas in the latter he has
to supply the answer in his own words. The question with only two possible answers
(usually ‘Yes’ or ‘No’) can be taken as a special case of the multiple choice question,
or can be named as a ‘closed question.’ There are some advantages and disadvantages
of each possible form of question. Multiple choice or closed questions have the
advantages of easy handling, simple to answer, quick and relatively inexpensive to
analyze. They are most amenable to statistical analysis. Sometimes, the provision of
alternative replies helps to make clear the meaning of the question. But the main
drawback of fixed alternative questions is that of “putting answers in people’s
mouths” i.e., they may force a statement of opinion on an issue about which the
respondent does not infact have any opinion. They are not appropriate when the issue
under consideration happens to be a complex one and also when the interest of the
researcher is in the exploration of a process. In such situations, open-ended questions
which are designed to permit a free response from the respondent rather than one
limited to certain stated alternatives are considered appropriate. Such questions give
the respondent considerable latitude in phrasing a reply. Getting the replies in
respondent’s own words is, thus, the major advantage of open-ended questions. But
one should not forget that, from an analytical point of view, open-ended questions are
more difficult to handle, raising problems of interpretation, comparability and
interviewer bias.
In practice, one rarely comes across a case when one questionnaire relies on one form
of questions alone. The various forms complement each other. As such questions of
different forms are included in one single questionnaire. For instance, multiple-choice
questions constitute the basis of a structured questionnaire, particularly in a mail
survey. But even there, various open-ended questions are generally inserted to provide
a more complete picture of the respondent’s feelings and attitudes.
Researcher must pay proper attention to the wordings of questions since reliable and
meaningful returns depend on it to a large extent. Since words are likely to affect
responses, they should be properly chosen. Simple words, which are familiar to all
respondents should be employed. Words with ambiguous meanings must be avoided.
Similarly, danger words, catch-words or words with emotional connotations should be
avoided. Caution must also be exercised in the use of phrases which reflect upon the
prestige of the respondent. Question wording, in no case, should bias the answer. In
fact, question wording and formulation is an art and can only be learnt by practice.
Essentials of a good questionnaire: To be successful, questionnaire should be
comparatively short and simple i.e., the size of the questionnaire should be kept to the
minimum. Questions should proceed in logical sequence moving from easy to more
difficult questions. Personal and intimate questions should be left to the end.
Technical terms and vague expressions capable of different interpretations should be
avoided in a questionnaire. Questions may be dichotomous (yes or no answers),
multiple choice (alternative answers listed) or open-ended. The latter type of questions
are often difficult to analyze and hence should be avoided in a questionnaire to the
extent possible. There should be some control questions in the questionnaire which
indicate the reliability of the respondent. For instance, a question designed to
determine the consumption of particular material may be asked first in terms of
financial expenditure and later in terms of weight. The control questions, thus,
introduce a cross-check to see whether the information collected is correct or not.
Questions affecting the sentiments of respondents should be avoided. Adequate space
for answers should be provided in the questionnaire to help editing and tabulation.
There should always be provision for indications of uncertainty, e.g., “do not know,”
“no preference” and so on. Brief directions with regard to filling up the questionnaire
should invariably be given in the questionnaire itself. Finally, the physical appearance
of the questionnaire affects the cooperation the researcher receives from the recipients
and as such an attractive looking questionnaire, particularly in mail surveys, is a plus
point for enlisting cooperation. The quality of the paper, along with its color, must be
good so that it may attract the attention of recipients.

COLLECTION OF SECONDARY DATA

Secondary data means data that are already available i.e., they refer to the data which
have already been collected and analysed by someone else. When the researcher
utilises secondary data, then he has to look into various sources from where he can
obtain them. In this case he is certainly not confronted with the problems that are
usually associated with the collection of original data. Secondary data may either be
published data or unpublished data. Usually published data are available in: (a)
various publications of the central, state are local governments; (b) various
publications of foreign governments or of international bodies and their subsidiary
organisations; (c) technical and trade journals; (d) books, magazines and newspapers;
(e) reports and publications of various associations connected with business and
industry, banks, stock exchanges, etc.; (f) reports prepared by research scholars,
universities, economists, etc. in different fields; and (g) public records and statistics,
historical documents, and other sources of published information. The sources of
unpublished data are many; they may be found in diaries, letters, unpublished
biographies and autobiographies and also may be available with scholars and research
workers, trade associations, labour bureaus and other public/ private individuals and
organisations.
Researcher must be very careful in using secondary data. He must make a minute
scrutiny because it is just possible that the secondary data may be unsuitable or may
be inadequate in the context of the problem which the researcher wants to study. In
this connection Dr. A.L. Bowley very aptly observes that it is never safe to take
published statistics at their face value without knowing their meaning and limitations
and it is always necessary to criticise arguments that can be based on them.
By way of caution, the researcher, before using secondary data, must see that they
possess following characteristics:

1. Reliability of data: The reliability can be tested by finding out such things
about the said data: (a) Who collected the data? (b) What were the sources of
data? (c) Were they collected by using proper methods (d) At what time were
they collected?(e) Was there any bias of the compiler? (t) What level of
accuracy was desired? Was it achieved ?

2. Suitability of data: The data that are suitable for one enquiry may not
necessarily be found suitable in another enquiry. Hence, if the available data
are found to be unsuitable, they should not be used by the researcher. In this
context, the researcher must very carefully scrutinise the definition of various
terms and units of collection used at the time of collecting the data from the
primary source originally. Similarly, the object, scope and nature of the original
enquiry must also be studied. If the researcher finds differences in these, the
data will remain unsuitable for the present enquiry and should not be used.
3. Adequacy of data: If the level of accuracy achieved in data is found
inadequate for the purpose of the present enquiry, they will be considered as
inadequate and should not be used by the researcher. The data will also be
considered inadequate, if they are related to an area which may be either
narrower or wider than the area of the present enquiry.
From all this we can say that it is very risky to use the already available data.
The already available data should be used by the researcher only when he finds
them reliable, suitable and adequate. But he should not blindly discard the use
of such data if they are readily available from authentic sources and are also
suitable and adequate for in that case it will not be economical to spend time
and energy in field surveys for collecting information. At times, there may be
wealth of usable information in the already available data which must be used
by an intelligent researcher but with due precaution.

ELEMENTS/TYPES OF ANALYSIS

As stated earlier, by analysis we mean the computation of certain indices or


measures along with searching for patterns of relationship that exist among the
data groups. Analysis, particularly in case of survey or experimental data, involves
estimating the values of unknown parameters of the population and testing of
hypotheses for drawing inferences. Analysis may, therefore, be categorised as
descriptive analysis and inferential analysis (Inferential analysis is often known as
statistical analysis). “Descriptive analysis is largely the study of distributions of
one variable. This study provides us with profiles of companies, work groups,
persons and other subjects on any of a multiple of characteristics such as size.
Composition, efficiency, preferences, etc.”2. this sort of analysis may be in respect
of one variable (described as unidimensional analysis), or in respect of two
variables (described as bivariate analysis) or in respect of more than two variables
(described as multivariate analysis). In this context we work out various measures
that show the size and shape of a distribution(s) along with the study of measuring
relationships between two or more variables.

We may as well talk of correlation analysis and causal analysis. Correlation


analysis studies the joint variation of two or more variables for determining the
amount of correlation between two or more variables. Causal analysis is concerned
with the study of how one or more variables affect changes in another variable. It
is thus a study of functional relationships existing between two or more variables.
This analysis can be termed as regression analysis. Causal analysis is considered
relatively more important in experimental researches, whereas in most social and
business researches our interest lies in understanding and controlling relationships
between variables then with determining causes per se and as such we consider
correlation analysis as relatively more important.

In modern times, with the availability of computer facilities, there has been a rapid
development of multivariate analysis which may be defined as “all statistical
methods which simultaneously analyse more than two variables on a sample of
observations”. Usually the following analyses are involved when we make a
reference of multivariate analysis:

∑ Multiple regression analysis: This analysis is adopted when the researcher has
one dependent variable which is presumed to be a function of two or more
independent variables. The objective of this analysis is to make a prediction
about the dependent variable based on its covariance with all the concerned
independent variables.

∑ Multiple discriminant analysis: This analysis is appropriate when the researcher


has a single dependent variable that cannot be measured, but can be classified
into two or more groups on the basis of some attribute. The object of this
analysis happens to be to predict an entity’s possibility of belonging to a
particular group based on several predictor variables.

∑ Multivariate analysis of variance (or multi-ANOVA): This analysis is an


extension of twoway ANOVA, wherein the ratio of among group variance to
within group variance is worked out on a set of variables.

∑ Canonical analysis: This analysis can be used in case of both measurable and
non-measurable variables for the purpose of simultaneously predicting a set of
dependent variables from their joint covariance with a set of independent
variables.

∑ Inferential analysis: is concerned with the various tests of significance for


testing hypotheses in order to determine with what validity data can be said to
indicate some conclusion or conclusions. It is also concerned with the
estimation of population values. It is mainly on the basis of inferential analysis
that the task of interpretation (i.e., the task of drawing inferences and
conclusions) is performed.
STATISTICS IN RESEARCH

The role of statistics in research is to function as a tool in designing research,


analyzing its data and drawing conclusions there from. Most research studies result
in a large volume of raw data which must be suitably reduced so that the same can
be read easily and can be used for further analysis. Clearly the science of statistics
cannot be ignored by any research worker, even though he may not have occasion
to use statistical methods in all their details and ramifications. Classification and
tabulation, as stated earlier, achieve this objective to some extent, but we have to
go a step further and develop certain indices or measures to summarize the
collected/classified data. Only after this we can adopt the process of generalization
from small groups (i.e., samples) to population. If fact, there are two major areas
of statistics viz., descriptive statistics and inferential statistics. Descriptive
statistics concern the development of certain indices from the raw data, whereas
inferential statistics concern with the process of generalization. Inferential statistics
are also known as sampling statistics and are mainly concerned with two major
type of problems:

i. the estimation of population parameters, and

ii. the testing of statistical hypotheses.

The important statistical measures* that are used to summarize the survey/research
data are:

1. measures of central tendency or statistical averages;

2. measures of dispersion;

3. measures of asymmetry (skewness);

4. measures of relationship; and

5. other measures.

Amongst the measures of central tendency, the three most important ones are the
arithmetic average or mean, median and mode. Geometric mean and harmonic
mean are also sometimes used.
From among the measures of dispersion, variance, and its square root—the
standard deviation are the most often used measures. Other measures such as mean
deviation, range, etc. are also used. For comparison purpose, we use mostly the
coefficient of standard deviation or the coefficient of variation.
In respect of the measures of skewness and kurtosis, we mostly use the first
measure of skewness based on mean and mode or on mean and median. Other
measures of skewness, based on quartiles or on the methods of moments, are also
used sometimes. Kurtosis is also used to measure the peakedness of the curve of
the frequency distribution.
Amongst the measures of relationship, Karl Pearson’s coefficient of correlation is
the frequently used measure in case of statistics of variables, whereas Yule’s
coefficient of association is used in case of statistics of attributes. Multiple
correlation coefficient, partial correlation coefficient, regression analysis, etc., are
other important measures often used by a researcher.
Index numbers, analysis of time series, coefficient of contingency, etc., are other
measures that may as well be used by a researcher, depending upon the nature of
the problem under study.
We give below a brief outline of some important measures (our of the above listed
measures) often used in the context of research studies.

MEASURES OF DISPERSION

An averages can represent a series only as best as a single figure can, but it
certainly cannot reveal the entire story of any phenomenon under study. Specially
it fails to give any idea about the scatter of the values of items of a variable in the
series around the true value of average. In order to measure this scatter, statistical
devices called measures of dispersion are calculated. Important measures of
dispersion are

a. range,

b. mean deviation, and

c. standard deviation.

Range is the simplest possible measure of dispersion and is defined as the


difference between the values of the extreme items of a series. Thus,
Range=Highest value of an item in a series - Lowest value of an item in a series
The utility of range is that it gives an idea of the variability very quickly, but the
drawback is that range is affected very greatly by fluctuations of sampling. Its
value is never stable, being based on only two values of the variable. As such,
range is mostly used as a rough measure of variability and is not considered as an
appropriate measure in serious research studies.

Mean deviation is the average of difference of the values of items from some
average of the series. Such a difference is technically described as deviation. In
calculating mean deviation we ignore the minus sign of deviations while taking
their total for obtaining the mean deviation. Mean deviation is, thus, obtained as
under:

When mean deviation is divided by the average used in finding out the mean
deviation itself, the resulting quantity is described as the coefficient of mean
deviation. Coefficient of mean deviation is a relative measure of dispersion and is
comparable to similar measure of other series. Mean deviation and its coefficient
are used in statistical studies for judging the variability, and thereby render the
study of central tendency of a series more precise by throwing light on the
typicalness of an average. It is a better measure of variability than range as it takes
into consideration the values of all items of a series. Even then it is not a
frequently used measure as it is not amenable to algebraic process.

Standard deviation is most widely used measure of dispersion of a series and is


commonly denoted by the symbol ‘ s ’ (pronounced as sigma). Standard deviation
is defined as the square-root of the average of squares of deviations, when such
deviations for the values of individual items in a series are obtained from the
arithmetic average. It is worked out as under:

When we divide the standard deviation by the arithmetic average of the series, the
resulting quantity is known as coefficient of standard deviation which happens to
be a relative measure and is often used for comparing with similar measure of
other series. When this coefficient of standard deviation is multiplied by 100, the
resulting figure is known as coefficient of variation. Sometimes, we work out the
square of standard deviation, known as variance, which is frequently used in the
context of analysis of variation.

The standard deviation (along with several related measures like variance,
coefficient of variation, etc.) is used mostly in research studies and is regarded as a
very satisfactory measure of dispersion in a series. It is amenable to mathematical
manipulation because the algebraic signs are not ignored in its calculation (as we
ignore in case of mean deviation). It is less affected by fluctuations of sampling.
These advantages make standard deviation and its coefficient a very popular
measure of the scatteredness of a series. It is popularly used in the context of
estimation and testing of hypotheses.

MEASURES OF ASYMMETRY (SKEWNESS)

When the distribution of item in a series happens to be perfectly symmetrical, we


then have the following type of curve for the distribution:

Such a curve is technically described as a normal curve and the relating


distribution as normal distribution. Such a curve is perfectly bell shaped curve in
which case the value of X or M or Z is just the same and skewness is altogether
absent. But if the curve is distorted (whether on the right side or on the left side),
we have asymmetrical distribution which indicates that there is skewness. If the
curve is distorted on the right side, we have positive skewness but when the curve
is distorted towards left, we have negative skewness as shown here under:

Skewness is, thus, a measure of asymmetry and shows the manner in which the
items are clustered around the average. In a symmetrical distribution, the items
show a perfect balance on either side of the mode, but in a skew distribution the
balance is thrown to one side. The amount by which the balance exceeds on one
side measures the skewness of the series. The difference between the mean,
median or the mode provides an easy way of expressing skewness in a series. In
case of positive skewness, we have Z < M < X and in case of negative skewness
we have X < M < Z. Usually we measure skewness in this way:
Skewness = X – Z and its coefficient (j) is worked
In case Z is not well defined, then we work out skewness as under:
Skewness = 3( X – M) and its coefficient (j) is worked
The significance of skewness lies in the fact that through it one can study the
formation of series and can have the idea about the shape of the curve, whether
normal or otherwise, when the items of a given series are plotted on a graph.
Kurtosis is the measure of flat-toppedness of a curve. A bell shaped curve or the
normal curve is Mesokurtic because it is kurtic in the centre; but if the curve is
relatively more peaked than the normal curve, it is called Leptokurtic whereas a
curve is more flat than the normal curve, it is called Platykurtic. In brief, Kurtosis
is the humpedness of the curve and points to the nature of distribution of items in
the middle of a series.

It may be pointed out here that knowing the shape of the distribution curve is
crucial to the use of statistical methods in research analysis since most methods
make specific assumptions about the nature of the distribution curve.

SIMPLE REGRESSION ANALYSIS

Regression is the determination of a statistical relationship between two or more


variables. In simple regression, we have only two variables, one variable (defined
as independent) is the cause of the behavior of another one (defined as dependent
variable). Regression can only interpret what exists physically i.e., there must be a
physical way in which independent variable X can affect dependent variable Y.
The basic relationship between X and Y is given by
Y = a + bX
where the symbol Y denotes the estimated value of Y for a given value of X. This
equation is known as the regression equation of Y on X (also represents the
regression line of Y on X when drawn on a graph) which means that each unit
change in X produces a change of b in Y, which is positive for direct and negative
for inverse relationships.

Then generally used method to find the ‘best’ fit that a straight line of this kind can
give is the least-square method. To use it efficiently, we first determine
Thus, the regression analysis is a statistical method to deal with the formulation of
mathematical model depicting relationship amongst variables which can be used
for the purpose of prediction of the values of dependent variable, given the values
of the independent variable.

[Alternatively, for fitting a regression equation of the type $ Y = a + bX to the


given values of X and Y variables, we can find the values of the two constants viz.,
a and b by using the following two normal equations:

and then solving these equations for finding a and b values. Once these values are
obtained and have been put in the equation $ Y = a + bX, we say that we have
fitted the regression equation of Y on X to the given data. In a similar fashion, we
can develop the regression equation of X and Y.

PARTIAL CORRELATION

Partial correlation measures separately the relationship between two variables in


such a way that the effects of other related variables are eliminated. In other words,
in partial correlation analysis, we aim at measuring the relation between a
dependent variable and a particular independent variable by holding all other
variables constant. Thus, each partial coefficient of correlation measures the effect
of its independent variable on the dependent variable. To obtain it, it is first
necessary to compute the simple coefficients of correlation between each set of
pairs of variables as stated earlier. In the case of two independent variables, we
shall have two partial correlation coefficients denoted ryx1 ×x2 and ryx x 2 × 1
which are worked out as under:

This measures the effort of X1 on Y, more precisely, that proportion of the


variation of Y not explained by X2 which is explained by X1. Also,
These formulae of the alternative approach are based on simple coefficients of
correlation (also known as zero order coefficients since no variable is held constant
when simple correlation coefficients are worked out). The partial correlation
coefficients are called first order coefficients when one variable is held constant as
shown above; they are known as second order coefficients when two variables are
held constant and so on.

You might also like