Methods of Data Collection Collection of Primary Data
Methods of Data Collection Collection of Primary Data
Methods of Data Collection Collection of Primary Data
i. observation method,
ii. interview method,
iii. through questionnaires,
iv. through schedules, and
v. other methods which include
a. warranty cards;
b. distributor audits;
c. pantry audits;
d. consumer panels;
e. using mechanical devices;
f. through projective techniques;
g. depth interviews, and
h. content analysis.
The observation method is the most commonly used method specially in studies
relating to behavioral sciences. In a way we all observe things around us, but this sort
of observation is not scientific observation. Observation becomes a scientific tool and
the method of data collection for the researcher, when it serves a formulated research
purpose, is systematically planned and recorded and is subjected to checks and
controls on validity and reliability. Under the observation method, the information is
sought by way of investigator’s own direct observation without asking from the
respondent. For instance, in a study relating to consumer behavior, the investigator
instead of asking the brand of wrist watch used by the respondent, may himself look at
the watch. The main advantage of this method is that subjective bias is eliminated, if
observation is done accurately. Secondly, the information obtained under this method
relates to what is currently happening; it is not complicated by either the past behavior
or future intentions or attitudes. Thirdly, this method is independent of respondents’
willingness to respond and as such is relatively less demanding of active cooperation
on the part of respondents as happens to be the case in the interview or the
questionnaire method. This method is particularly suitable in studies which deal with
subjects (i.e., respondents) who are not capable of giving verbal reports of their
feelings for one reason or the other However, observation method has various
limitations. Firstly, it is an expensive method. Secondly, the information provided by
this method is very limited. Thirdly, sometimes unforeseen factors may interfere with
the observational task. At times, the fact that some people are rarely accessible to
direct observation creates obstacle for this method to collect data effectively.
While using this method, the researcher should keep in mind things like: What should
be observed? How the observations should be recorded? Or how the accuracy of
observation can be ensured? In case the observation is characterized by a careful
definition of the units to be observed, the style of recording the observed information,
standardized conditions of observation and the selection of pertinent data of
observation, then the observation is called as structured observation. But when
observation is to take place without these characteristics to be thought of in advance,
the same is termed as unstructured observation. Structured observation is considered
appropriate in descriptive studies, whereas in an exploratory study the observational
procedure is most likely to be relatively unstructured.
We often talk about participant and non-participant types of observation in the context
of studies, particularly of social sciences. This distinction depends upon the observer’s
sharing or not sharing the life of the group he is observing. If the observer observes by
making himself, more or less, a member of the group he is observing so that he can
experience what the members of the group experience, the observation is called as the
participant observation. But when the observer observes as a detached emissary
without any attempt on his part to experience through participation what others feel,
the observation of this type is often termed as non-participant observation. (When the
observer is observing in such a manner that his presence may be unknown to the
people he is observing, such an observation is described as disguised observation.)
There are several merits of the participant type of observation: The researcher is
enabled to record the natural behavior of the group. The researcher can even gather
information which could not easily be obtained if he observes in a disinterested
fashion. The researcher can even verify the truth of statements made by informants in
the context of a questionnaire or a schedule. But there are also certain demerits of this
type of observation viz., the observer may lose the objectivity to the extent he
participates emotionally; the problem of observation-control is not solved; and it may
narrow-down the researcher’s range of experience.
Sometimes we talk of controlled and uncontrolled observation. If the observation
takes place in the natural setting, it may be termed as uncontrolled observation, but
when observation takes place according to definite pre-arranged plans, involving
experimental procedure, the same is then termed controlled observation. In non-
controlled observation, no attempt is made to use precision instruments. The major
aim of this type of observation is to get a spontaneous picture of life and persons. It
has a tendency to supply naturalness and completeness of behavior, allowing
sufficient time for observing it. But in controlled observation, we use mechanical (or
precision) instruments as aids to accuracy and standardization. Such observation has a
tendency to supply formalized data upon which generalizations can be built with some
degree of assurance. The main pitfall of non-controlled observation is that of
subjective interpretation. There is also the danger of having the feeling that we know
more about the observed phenomena than we actually do. Generally, controlled
observation takes place in various experiments that are carried out in a laboratory or
under controlled conditions, whereas uncontrolled observation is resorted to in case of
exploratory researches.
INTERVIEW METHOD
2. Interviewer by his own skill can overcome the resistance, if any, of the
respondents; the interview method can be made to yield an almost perfect
sample of the general population.
7. The interviewer can usually control which person(s) will answer the questions.
This is not possible in mailed questionnaire approach. If so desired, group
discussions may also be held.
8. The interviewer may catch the informant off-guard and thus may secure the
most spontaneous reactions than would be the case if mailed questionnaire is
used.
9. The language of the interview can be adopted to the ability or educational level
of the person interviewed and as such misinterpretations concerning questions
can be avoided.
But there are also certain weaknesses of the interview method. Among the important
weaknesses, mention may be made of the following:
2. There remains the possibility of the bias of interviewer as well as that of the
respondent; there also remains the headache of supervision and control of
interviewers.
5. The presence of the interviewer on the spot may over-stimulate the respondent,
sometimes even to the extent that he may give imaginary information just to
make the interview interesting.
6. Under the interview method the organization required for selecting, training
and supervising the field-staff is more complex with formidable problems.
3. It is cheaper than personal interviewing method; here the cost per response is
relatively low.
5. There is a higher rate of response than what we have in mailing method; the
non-response is generally very low.
But this system of collecting information is not free from demerits. Some of these
may be highlighted.
6. Questions have to be short and to the point; probes are difficult to handle.
This method of data collection is quite popular, particularly in case of big enquiries. It
is being adopted by private individuals, research workers, private and public
organizations and even by governments. In this method a questionnaire is sent
(usually by post) to the persons concerned with a request to answer the questions and
return the questionnaire. A questionnaire consists of a number of questions printed or
typed in a definite order on a form or set of forms. The questionnaire is mailed to
respondents who are expected to read and understand the questions and write down
the reply in the space meant for the purpose in the questionnaire itself. The
respondents have to answer the questions on their own.
The method of collecting data by mailing the questionnaires to respondents is most
extensively employed in various economic and business surveys. The merits claimed
on behalf of this method are as follows:
1. There is low cost even when the universe is large and is widely spread
geographically.
2. It is free from the bias of the interviewer; answers are in respondents’ own
words.
3. Respondents have adequate time to give well thought out answers.
5. Large samples can be made use of and thus the results can be made more
dependable and reliable.
1. Low rate of return of the duly filled in questionnaires; bias due to no-response
is often indeterminate.
Before using this method, it is always advisable to conduct ‘pilot study’ (Pilot Survey)
for testing the questionnaires. In a big enquiry the significance of pilot survey is felt
very much. Pilot survey is in fact the replica and rehearsal of the main survey. Such a
survey, being conducted by experts, brings to the light the weaknesses (if any) of the
questionnaires and also of the survey techniques. From the experience gained in this
way, improvement can be effected.
Main aspects of a questionnaire: Quite often questionnaire is considered as the heart
of a survey operation. Hence it should be very carefully constructed. If it is not
properly set up, then the survey is bound to fail. This fact requires us to study the
main aspects of a questionnaire viz., the general form, question sequence and question
formulation and wording. Researcher should note the following with regard to these
three main aspects of a questionnaire:
General form: So far as the general form of a questionnaire is concerned, it can either
be structured or unstructured questionnaire. Structured questionnaires are those
questionnaires in which there are definite, concrete and pre-determined questions. The
questions are presented with exactly the same wording and in the same order to all
respondents. Resort is taken to this sort of standardization to ensure that all
respondents reply to the same set of questions. The form of the question may be either
closed (i.e., of the type ‘yes’ or ‘no’) or open (i.e., inviting free response) but should
be stated in advance and not constructed during questioning. Structured questionnaires
may also have fixed alternative questions in which responses of the informants are
limited to the stated alternatives. Thus a highly structured questionnaire is one in
which all questions and answers are specified and comments in the respondent’s own
words are held to the minimum. When these characteristics are not present in a
questionnaire, it can be termed as unstructured or non-structured questionnaire. More
specifically, we can say that in an unstructured questionnaire, the interviewer is
provided with a general guide on the type of information to be obtained, but the exact
question formulation is largely his own responsibility and the replies are to be taken
down in the respondent’s own words to the extent possible; in some situations tape
recorders may be used to achieve this goal.
Structured questionnaires are simple to administer and relatively inexpensive to
analyze. The provision of alternative replies, at times, helps to understand the
meaning of the question clearly. But such questionnaires have limitations too. For
instance, wide range of data and that too in respondent’s own words cannot be
obtained with structured questionnaires. They are usually considered inappropriate in
investigations where the aim happens to be to probe for attitudes and reasons for
certain actions or feelings. They are equally not suitable when a problem is being first
explored and working hypotheses sought. In such situations, unstructured
questionnaires may be used effectively. Then on the basis of the results obtained in
pretest (testing before final use) operations from the use of unstructured
questionnaires, one can construct a structured questionnaire for use in the main study.
Question sequence: In order to make the questionnaire effective and to ensure quality
to the replies received, a researcher should pay attention to the question-sequence in
preparing the questionnaire. A proper sequence of questions reduces considerably the
chances of individual questions being misunderstood. The question-sequence must be
clear and smoothly-moving, meaning thereby that the relation of one question to
another should be readily apparent to the respondent, with questions that are easiest to
answer being put in the beginning. The first few questions are particularly important
because they are likely to influence the attitude of the respondent and in seeking his
desired cooperation. The opening questions should be such as to arouse human
interest. The following type of questions should generally be avoided as opening
questions in a questionnaire:
1. questions that put too great a strain on the memory or intellect of the
respondent;
Following the opening questions, we should have questions that are really vital to the
research problem and a connecting thread should run through successive questions.
Ideally, the questionsequence should conform to the respondent’s way of thinking.
Knowing what information is desired, the researcher can rearrange the order of the
questions (this is possible in case of unstructured questionnaire) to fit the discussion in
each particular case. But in a structured questionnaire the best that can be done is to
determine the question-sequence with the help of a Pilot Survey which is likely to
produce good rapport with most respondents. Relatively difficult questions must be
relegated towards the end so that even if the respondent decides not to answer such
questions, considerable information would have already been obtained. Thus,
question-sequence should usually go from the general to the more specific and the
researcher must always remember that the answer to a given question is a function not
only of the question itself, but of all previous questions as well. For instance, if one
question deals with the price usually paid for coffee and the next with reason for
preferring that particular brand, the answer to this latter question may be couched
largely in terms of price differences.
Question formulation and wording: With regard to this aspect of questionnaire, the
researcher should note that each question must be very clear for any sort of
misunderstanding can do irreparable harm to a survey. Question should also be
impartial in order not to give a biased picture of the true state of affairs. Questions
should be constructed with a view to their forming a logical part of a well thought out
tabulation plan. In general, all questions should meet the following standards—
For instance, instead of asking. “How many razor blades do you use annually?” The
more realistic question would be to ask, “How many razor blades did you use last
week?”
Concerning the form of questions, we can talk about two principal forms, viz.,
multiple choice question and the open-end question. In the former the respondent
selects one of the alternative possible answers put to him, whereas in the latter he has
to supply the answer in his own words. The question with only two possible answers
(usually ‘Yes’ or ‘No’) can be taken as a special case of the multiple choice question,
or can be named as a ‘closed question.’ There are some advantages and disadvantages
of each possible form of question. Multiple choice or closed questions have the
advantages of easy handling, simple to answer, quick and relatively inexpensive to
analyze. They are most amenable to statistical analysis. Sometimes, the provision of
alternative replies helps to make clear the meaning of the question. But the main
drawback of fixed alternative questions is that of “putting answers in people’s
mouths” i.e., they may force a statement of opinion on an issue about which the
respondent does not infact have any opinion. They are not appropriate when the issue
under consideration happens to be a complex one and also when the interest of the
researcher is in the exploration of a process. In such situations, open-ended questions
which are designed to permit a free response from the respondent rather than one
limited to certain stated alternatives are considered appropriate. Such questions give
the respondent considerable latitude in phrasing a reply. Getting the replies in
respondent’s own words is, thus, the major advantage of open-ended questions. But
one should not forget that, from an analytical point of view, open-ended questions are
more difficult to handle, raising problems of interpretation, comparability and
interviewer bias.
In practice, one rarely comes across a case when one questionnaire relies on one form
of questions alone. The various forms complement each other. As such questions of
different forms are included in one single questionnaire. For instance, multiple-choice
questions constitute the basis of a structured questionnaire, particularly in a mail
survey. But even there, various open-ended questions are generally inserted to provide
a more complete picture of the respondent’s feelings and attitudes.
Researcher must pay proper attention to the wordings of questions since reliable and
meaningful returns depend on it to a large extent. Since words are likely to affect
responses, they should be properly chosen. Simple words, which are familiar to all
respondents should be employed. Words with ambiguous meanings must be avoided.
Similarly, danger words, catch-words or words with emotional connotations should be
avoided. Caution must also be exercised in the use of phrases which reflect upon the
prestige of the respondent. Question wording, in no case, should bias the answer. In
fact, question wording and formulation is an art and can only be learnt by practice.
Essentials of a good questionnaire: To be successful, questionnaire should be
comparatively short and simple i.e., the size of the questionnaire should be kept to the
minimum. Questions should proceed in logical sequence moving from easy to more
difficult questions. Personal and intimate questions should be left to the end.
Technical terms and vague expressions capable of different interpretations should be
avoided in a questionnaire. Questions may be dichotomous (yes or no answers),
multiple choice (alternative answers listed) or open-ended. The latter type of questions
are often difficult to analyze and hence should be avoided in a questionnaire to the
extent possible. There should be some control questions in the questionnaire which
indicate the reliability of the respondent. For instance, a question designed to
determine the consumption of particular material may be asked first in terms of
financial expenditure and later in terms of weight. The control questions, thus,
introduce a cross-check to see whether the information collected is correct or not.
Questions affecting the sentiments of respondents should be avoided. Adequate space
for answers should be provided in the questionnaire to help editing and tabulation.
There should always be provision for indications of uncertainty, e.g., “do not know,”
“no preference” and so on. Brief directions with regard to filling up the questionnaire
should invariably be given in the questionnaire itself. Finally, the physical appearance
of the questionnaire affects the cooperation the researcher receives from the recipients
and as such an attractive looking questionnaire, particularly in mail surveys, is a plus
point for enlisting cooperation. The quality of the paper, along with its color, must be
good so that it may attract the attention of recipients.
Secondary data means data that are already available i.e., they refer to the data which
have already been collected and analysed by someone else. When the researcher
utilises secondary data, then he has to look into various sources from where he can
obtain them. In this case he is certainly not confronted with the problems that are
usually associated with the collection of original data. Secondary data may either be
published data or unpublished data. Usually published data are available in: (a)
various publications of the central, state are local governments; (b) various
publications of foreign governments or of international bodies and their subsidiary
organisations; (c) technical and trade journals; (d) books, magazines and newspapers;
(e) reports and publications of various associations connected with business and
industry, banks, stock exchanges, etc.; (f) reports prepared by research scholars,
universities, economists, etc. in different fields; and (g) public records and statistics,
historical documents, and other sources of published information. The sources of
unpublished data are many; they may be found in diaries, letters, unpublished
biographies and autobiographies and also may be available with scholars and research
workers, trade associations, labour bureaus and other public/ private individuals and
organisations.
Researcher must be very careful in using secondary data. He must make a minute
scrutiny because it is just possible that the secondary data may be unsuitable or may
be inadequate in the context of the problem which the researcher wants to study. In
this connection Dr. A.L. Bowley very aptly observes that it is never safe to take
published statistics at their face value without knowing their meaning and limitations
and it is always necessary to criticise arguments that can be based on them.
By way of caution, the researcher, before using secondary data, must see that they
possess following characteristics:
1. Reliability of data: The reliability can be tested by finding out such things
about the said data: (a) Who collected the data? (b) What were the sources of
data? (c) Were they collected by using proper methods (d) At what time were
they collected?(e) Was there any bias of the compiler? (t) What level of
accuracy was desired? Was it achieved ?
2. Suitability of data: The data that are suitable for one enquiry may not
necessarily be found suitable in another enquiry. Hence, if the available data
are found to be unsuitable, they should not be used by the researcher. In this
context, the researcher must very carefully scrutinise the definition of various
terms and units of collection used at the time of collecting the data from the
primary source originally. Similarly, the object, scope and nature of the original
enquiry must also be studied. If the researcher finds differences in these, the
data will remain unsuitable for the present enquiry and should not be used.
3. Adequacy of data: If the level of accuracy achieved in data is found
inadequate for the purpose of the present enquiry, they will be considered as
inadequate and should not be used by the researcher. The data will also be
considered inadequate, if they are related to an area which may be either
narrower or wider than the area of the present enquiry.
From all this we can say that it is very risky to use the already available data.
The already available data should be used by the researcher only when he finds
them reliable, suitable and adequate. But he should not blindly discard the use
of such data if they are readily available from authentic sources and are also
suitable and adequate for in that case it will not be economical to spend time
and energy in field surveys for collecting information. At times, there may be
wealth of usable information in the already available data which must be used
by an intelligent researcher but with due precaution.
ELEMENTS/TYPES OF ANALYSIS
In modern times, with the availability of computer facilities, there has been a rapid
development of multivariate analysis which may be defined as “all statistical
methods which simultaneously analyse more than two variables on a sample of
observations”. Usually the following analyses are involved when we make a
reference of multivariate analysis:
∑ Multiple regression analysis: This analysis is adopted when the researcher has
one dependent variable which is presumed to be a function of two or more
independent variables. The objective of this analysis is to make a prediction
about the dependent variable based on its covariance with all the concerned
independent variables.
∑ Canonical analysis: This analysis can be used in case of both measurable and
non-measurable variables for the purpose of simultaneously predicting a set of
dependent variables from their joint covariance with a set of independent
variables.
The important statistical measures* that are used to summarize the survey/research
data are:
2. measures of dispersion;
5. other measures.
Amongst the measures of central tendency, the three most important ones are the
arithmetic average or mean, median and mode. Geometric mean and harmonic
mean are also sometimes used.
From among the measures of dispersion, variance, and its square root—the
standard deviation are the most often used measures. Other measures such as mean
deviation, range, etc. are also used. For comparison purpose, we use mostly the
coefficient of standard deviation or the coefficient of variation.
In respect of the measures of skewness and kurtosis, we mostly use the first
measure of skewness based on mean and mode or on mean and median. Other
measures of skewness, based on quartiles or on the methods of moments, are also
used sometimes. Kurtosis is also used to measure the peakedness of the curve of
the frequency distribution.
Amongst the measures of relationship, Karl Pearson’s coefficient of correlation is
the frequently used measure in case of statistics of variables, whereas Yule’s
coefficient of association is used in case of statistics of attributes. Multiple
correlation coefficient, partial correlation coefficient, regression analysis, etc., are
other important measures often used by a researcher.
Index numbers, analysis of time series, coefficient of contingency, etc., are other
measures that may as well be used by a researcher, depending upon the nature of
the problem under study.
We give below a brief outline of some important measures (our of the above listed
measures) often used in the context of research studies.
MEASURES OF DISPERSION
An averages can represent a series only as best as a single figure can, but it
certainly cannot reveal the entire story of any phenomenon under study. Specially
it fails to give any idea about the scatter of the values of items of a variable in the
series around the true value of average. In order to measure this scatter, statistical
devices called measures of dispersion are calculated. Important measures of
dispersion are
a. range,
c. standard deviation.
Mean deviation is the average of difference of the values of items from some
average of the series. Such a difference is technically described as deviation. In
calculating mean deviation we ignore the minus sign of deviations while taking
their total for obtaining the mean deviation. Mean deviation is, thus, obtained as
under:
When mean deviation is divided by the average used in finding out the mean
deviation itself, the resulting quantity is described as the coefficient of mean
deviation. Coefficient of mean deviation is a relative measure of dispersion and is
comparable to similar measure of other series. Mean deviation and its coefficient
are used in statistical studies for judging the variability, and thereby render the
study of central tendency of a series more precise by throwing light on the
typicalness of an average. It is a better measure of variability than range as it takes
into consideration the values of all items of a series. Even then it is not a
frequently used measure as it is not amenable to algebraic process.
When we divide the standard deviation by the arithmetic average of the series, the
resulting quantity is known as coefficient of standard deviation which happens to
be a relative measure and is often used for comparing with similar measure of
other series. When this coefficient of standard deviation is multiplied by 100, the
resulting figure is known as coefficient of variation. Sometimes, we work out the
square of standard deviation, known as variance, which is frequently used in the
context of analysis of variation.
The standard deviation (along with several related measures like variance,
coefficient of variation, etc.) is used mostly in research studies and is regarded as a
very satisfactory measure of dispersion in a series. It is amenable to mathematical
manipulation because the algebraic signs are not ignored in its calculation (as we
ignore in case of mean deviation). It is less affected by fluctuations of sampling.
These advantages make standard deviation and its coefficient a very popular
measure of the scatteredness of a series. It is popularly used in the context of
estimation and testing of hypotheses.
Skewness is, thus, a measure of asymmetry and shows the manner in which the
items are clustered around the average. In a symmetrical distribution, the items
show a perfect balance on either side of the mode, but in a skew distribution the
balance is thrown to one side. The amount by which the balance exceeds on one
side measures the skewness of the series. The difference between the mean,
median or the mode provides an easy way of expressing skewness in a series. In
case of positive skewness, we have Z < M < X and in case of negative skewness
we have X < M < Z. Usually we measure skewness in this way:
Skewness = X – Z and its coefficient (j) is worked
In case Z is not well defined, then we work out skewness as under:
Skewness = 3( X – M) and its coefficient (j) is worked
The significance of skewness lies in the fact that through it one can study the
formation of series and can have the idea about the shape of the curve, whether
normal or otherwise, when the items of a given series are plotted on a graph.
Kurtosis is the measure of flat-toppedness of a curve. A bell shaped curve or the
normal curve is Mesokurtic because it is kurtic in the centre; but if the curve is
relatively more peaked than the normal curve, it is called Leptokurtic whereas a
curve is more flat than the normal curve, it is called Platykurtic. In brief, Kurtosis
is the humpedness of the curve and points to the nature of distribution of items in
the middle of a series.
It may be pointed out here that knowing the shape of the distribution curve is
crucial to the use of statistical methods in research analysis since most methods
make specific assumptions about the nature of the distribution curve.
Then generally used method to find the ‘best’ fit that a straight line of this kind can
give is the least-square method. To use it efficiently, we first determine
Thus, the regression analysis is a statistical method to deal with the formulation of
mathematical model depicting relationship amongst variables which can be used
for the purpose of prediction of the values of dependent variable, given the values
of the independent variable.
and then solving these equations for finding a and b values. Once these values are
obtained and have been put in the equation $ Y = a + bX, we say that we have
fitted the regression equation of Y on X to the given data. In a similar fashion, we
can develop the regression equation of X and Y.
PARTIAL CORRELATION