Module 3 - Measurement and Scaling

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 65

MEASUREMENT &

SCALING
-Dr Nivedita Roy
School of Management
NIT, Rourkela
Measurement
 We measure physical objects like height, weight etc as well
as abstract concepts like a song or a painting, etc.
 By measurement, we mean the process of assigning
numbers to objects or observations.
 Properties like weight, height, etc., can be measured
directly with some standard unit of measurement, but it is
not that easy to measure properties like motivation to
succeed, ability to stand stress and the like.
 We can expect high accuracy in measuring the length of
pipe with a yard stick, but if the concept is abstract and the
measurement tools are not standardized, we are less
confident about the accuracy of the results of
measurement.
 Nominal data are numerical in name only, because they
do not share any of the properties of the numbers we deal
in ordinary arithmetic. For instance if we record marital
status as 1, 2, 3, or 4 as stated above, we cannot write 4 >
2 or 3 < 4.

 Nominal data is used to convert abstract phenomena to a


measurable form through nominal representation by
numbers. Eg, Male =1, Female=2.
 In those situations when we cannot do anything except set
up inequalities, we refer to the data as Ordinal data.

For example, Hardness number of ten minerals assigned in


a scale of 1 to 10. if the hardest is 1, 1> 2 implies mineral
no 1 is harder than mineral no 2.
 When in addition to setting up inequalities we can also
form differences, we refer to the data as Interval data.

For example, heat levels defined in deg centigrade, 75 deg,


130 deg, 145 deg, 200 deg etc.
Here, 130 deg > 75 deg; also 130 deg – 75 deg = 200 deg –
145 deg.
 When in addition to setting up inequalities and forming
differences, we can also form quotients (i.e., when we can
perform all the customary operations of mathematics), we
refer to such data as ratio data.

 In this sense, ratio data includes all the usual measurement


(or determinations) of length, height, money amounts,
weight, volume, area, pressures etc.
 The stated distinction between nominal, ordinal, interval
and ratio data is important, for the nature of a set of data
may suggest the use of particular statistical techniques.

 A researcher has to be quite alert about this aspect while


measuring properties of objects or of abstract concepts.
Measurement Scales
The most widely used classification of
measurement scales are:
(a) nominal scale;
(b) ordinal scale;
(c) interval scale; and
(d) ratio scale
Nominal Scale
Nominal scale is simply a system of assigning
number symbols to events in order to label them.
The usual example of this is the assignment of
numbers of basketball players in order to identify
them.

Such numbers cannot be considered to be


associated with an ordered scale for their order is
of no consequence; the numbers are just
convenient labels for the particular class of
events and as such have no quantitative value.
Nominal Scale
Nominal scales provide convenient ways of
keeping track of people, objects and events. One
cannot do much with the numbers involved.
For example, one cannot usefully average the
numbers on the back of a group of football players
and come up with a meaningful value. Neither can
one usefully compare the numbers assigned to one
group with the numbers assigned to another.
The counting of members in each group is the only
possible arithmetic operation when a nominal scale
is employed.
Nominal Scale
Nominal scale is the least powerful level of
measurement. It indicates no order or distance
relationship and has no arithmetic origin.

A nominal scale simply describes differences


between things by assigning them to categories.
Nominal data are, thus, counted data.
Nominal Scale
Accordingly, we are restricted to use MODE as
the measure of central tendency.

There is no generally used measure of dispersion


for nominal scales. CHI-SQUARE TEST is the
most common test of statistical significance that
can be utilized, and for the measures of
correlation, the contingency coefficient can be
worked out.
Ordinal Scale

The lowest level of the ordered scale that is


commonly used is the Ordinal scale.

The ordinal scale places events in order, but


there is no attempt to make the intervals of the
scale equal in terms of some rule.
Ordinal Scale
Rank orders represent ordinal scales and are
frequently used in research relating to qualitative
phenomena. A student’s rank in his graduation
class involves the use of an ordinal scale.

One has to be very careful in making statement


about scores based on ordinal scales. For
instance, if Ram’s position in his class is 10 and
Mohan’s position is 40, it cannot be said that
Ram’s position is four times as good as that of
Mohan.
Ordinal Scale

Ordinal scales only permit the ranking of


items from highest to lowest.

Ordinal measures have no absolute values,


and the real differences between adjacent
ranks may not be equal.
Ordinal Scale
All that can be said is that one person is higher or
lower on the scale than another, but more precise
comparisons cannot be made.

Thus, the use of an ordinal scale implies a


statement of ‘greater than’ or ‘less than’ (an
equality statement is also acceptable) without our
being able to state how much greater or less.
Ordinal Scale
Since the numbers of this scale have only a
rank meaning, the appropriate measure of
central tendency is the MEDIAN.

PERCENTILE or quartile measure is used for


measuring dispersion.

Correlations are restricted to various Rank


order methods.
Interval Scale
In the case of interval scale, the intervals are
adjusted in terms of some rule that has been
established as a basis for making the units equal.

The units are equal only in so far as one accepts


the assumptions on which the rule is based.

Interval scales can have an arbitrary zero, but it


is not possible to determine for them what may
be called an absolute zero or the unique origin.
Interval Scale
 The primary limitation of the interval scale is the lack of a
true zero; it does not have the capacity to measure the
complete absence of a trait or characteristic.

Example - The Fahrenheit scale. It shows similarities in


what one can and cannot do with it. One can say that an
increase in temperature from 30° to 40° involves the same
increase in temperature as an increase from 60° to 70°, but
one cannot say that the temperature of 60° is twice as
warm as the temperature of 30° because both numbers are
dependent on the fact that the zero on the scale is set
arbitrarily at the temperature of the freezing point of
water.
Interval Scale
Interval scales provide more powerful
measurement than ordinal scales, for interval scale
also incorporates the concept of equality of
interval.

MEAN is the appropriate measure of central


tendency, while STANDARD DEVIATION is the most
widely used measure of dispersion.

Product moment correlation techniques are


appropriate and the generally used tests for
statistical significance are the ‘t’ test and ‘F’ test.
Ratio Scale
Ratio scales have an absolute or true zero of
measurement.

The ratio involved does have significance and


facilitates a kind of comparison which is not
possible in case of an interval scale.

Ratio scale represents the actual amounts of


variables. Measures of physical dimensions such
as weight, height, distance, etc. are examples.
Ratio Scale

Generally, all statistical techniques are usable


with ratio scales and all manipulations that
one can carry out with real numbers can also
be carried out with ratio scale values.

Multiplication and division can be used with


this scale but not with other scales mentioned
above.
Ratio Scale

GEOMETRIC & HARMONIC MEAN can be


used as measures of central tendency and
Coefficients of variation may also be
calculated.
Proceeding from the nominal scale (the least
precise type of scale) to ratio scale (the most
precise), relevant information is obtained
increasingly.
If the nature of the variables permit, the
researcher should use the scale that provides
the most precise description.

Eg. Researchers in physical sciences have the


advantage to describe variables in ratio scale
form but the behavioural sciences are
generally limited to describe variables in
interval scale form, a less precise type of
measurement.
Primary Scales of Measurement
Sources of error in measurement
Measurement should be precise and unambiguous in an
ideal research study. This objective, however, is often not
met with in entirety.

Possible sources of error are :


 Respondent : respondent may be reluctant to express
strong negative feelings,
Or he may have very little knowledge but may not admit
his ignorance,

Transient factors like fatigue, boredom, anxiety, etc. may


limit the ability of the respondent to respond accurately
and fully.
Sources of error in measurement
 Situation : Any situation which places a
strain on an interview can have serious
effects on the interviewer-respondent
rapport,

If someone else is present, he can distort


responses by joining in or merely by being
present,

If the respondent feels that anonymity is not


assured, he may be reluctant to express
Sources of error in measurement
 Measurer : At times, the interviewer/
measurer may distort responses by rewording
or reordering questions.

His behaviour, style etc may encourage or


discourage certain replies from respondents.
Sources of error in measurement

Careless mechanical processing by


researcher may distort the findings.

Errors may also creep in because of incorrect


coding, faulty tabulation and/or statistical
calculations, particularly in the data-analysis
stage.
Sources of error in measurement

 Instrument : Error may arise because of the


defective measuring instrument.

The use of complex words, beyond the


comprehension of the respondent, ambiguous
meanings.
Sources of error in measurement

Poor printing, inadequate space for replies,


response choice omissions, etc. make the
measuring instrument defective and may
result in measurement errors.

Another type of instrument deficiency is the


poor sampling of the universe of items of
concern.
Responden
Instrument
t
Sources of
Error in
Measureme
nt

Measurer Situation
Researcher must know that correct measurement
depends on successfully meeting all of the
problems listed.

He must, to the extent possible, try to eliminate,


neutralize or otherwise deal with all the possible
sources of error so that the final results are not
contaminated.
Tests of Sound Measurement
Sound/Proper measurement must meet the
tests of Validity, Reliability and
Practicality.

Validity refers to the extent to which a test


measures what we actually wish to measure.
Reliability has to do with the accuracy and
precision of a measurement procedure .
Practicality is concerned with a wide range
of factors of economy, convenience, and
interpretability.
Test of Validity
Indicates the degree to which an instrument
measures what it is supposed to measure.
Validity can also be thought of as utility.
In other words, validity is the extent to which
differences found with a measuring
instrument reflect true differences among
those being tested.
Test of Validity
Three types :

(i) Content validity;


(ii) Criterion-related validity and
(iii) Construct validity.
Content validity
Is the extent to which a measuring
instrument provides adequate coverage of the
topic under study.

If the instrument contains a representative


sample of the universe, the content validity is
good.

Its determination is primarily judgemental


and intuitive.
Criterion-related validity
relates to our ability to predict some outcome or
estimate the existence of some current condition.

This form of validity reflects the success of


measures used for some empirical estimating
purpose. The concerned criterion must possess
the following qualities:
- Relevance
- Freedom from bias
- Reliability
- Availability
Criterion-related validity
A Criterion-related validity is a broad term that
actually refers to :
(i) Predictive validity - usefulness of a test in
predicting some future performance.
(ii) Concurrent validity - usefulness of a test in
closely relating to other measures of known validity.

Criterion-related validity is expressed as the


coefficient of correlation between test scores and
some measure of future performance or between test
scores and scores on another measure of known
validity.
Construct Validity
Is the most complex and abstract.
A measure is said to possess construct validity to the
degree that it confirms to predicted correlations with
other theoretical propositions.
Construct validity is the degree to which scores on a
test can be accounted for by the explanatory
constructs of a sound theory.
For determining construct validity, we associate a set
of other propositions with the results received from
using our measurement instrument. If measurements
on our devised scale correlate in a predicted way with
these other propositions, we can conclude that there is
some construct validity.
If the stated criteria and tests w.r.t.
validity are met with, we may state that
our measuring instrument is valid and
will result in correct measurement; otherwise
we shall have to look for more information
and/or resort to exercise of judgement.
Test of Reliability
A measuring instrument is reliable if it
provides consistent results.

Reliable measuring instrument does


contribute to validity, but a reliable
instrument need not be a valid instrument.
For instance, a scale that consistently
overweighs objects by five kgs., is a reliable
scale, but it does not give a valid measure of
weight.
Test of Reliability
But the other way is not true i.e., a valid
instrument is always reliable.

Accordingly reliability is not as valuable as


validity, but it is easier to assess reliability in
comparison to validity.
Test of Reliability
Two aspects of reliability - stability and
equivalence.
The stability aspect is concerned with
securing consistent results with repeated
measurements of the same person and with
the same instrument. We usually determine
the degree of stability by comparing the
results of repeated measurements.
The equivalence aspect considers how
much error may get introduced by
different investigators or different samples
of the items being studied.
Test of Reliability
Reliability can be improved in the following two ways:

(i) By standardising the conditions under which the


measurement takes place i.e., we must ensure that
external sources of variation such as boredom, fatigue,
etc., are minimised to the extent possible. That will
improve stability aspect.

(ii) By carefully designed directions for


measurement with no variation from group to
group, by using trained and motivated persons to
conduct the research and also by broadening the sample
of items used. This will improve equivalence aspect.
Test of Practicality
The practicality characteristic of a
measuring instrument can be judged in terms
of economy, convenience and
interpretability.

From the operational point of view, the


measuring instrument ought to be practical
i.e., it should be economical, convenient
and interpretable.
Test of Practicality
Economy consideration suggests that some trade-off
is needed between the ideal research project and
that which the budget can afford.
The length of measuring instrument is an
important area where economic pressures are
quickly felt.
Although more items give greater reliability as
stated earlier, but in the interest of limiting the
interview or observation time, we have to take
only few items for our study purpose.
Similarly, data-collection methods to be used are
also dependent at times upon economic factors.
Test of Practicality
Convenience test suggests that the
measuring instrument should be easy to
administer.
For this purpose one should give due
attention to the proper layout of the
measuring instrument.
For instance, a questionnaire, with clear
instructions (illustrated by examples), is
certainly more effective and easier to
complete than one which lacks these
features.
Test of Practicality
Interpretability consideration is specially important
when persons other than the designers of the test are
to interpret the results.

The measuring instrument, in order to be


interpretable, must be supplemented by :
(a) detailed instructions for administering the
test;
(b) scoring keys;
(c) evidence about the reliability and
(d) guides for using the test and for interpreting
results.
A Comparison of Scaling Techniques

Source: (Malhotra & Dash, 2014)


A Comparison of Scaling Techniques
 Comparative scales involve the direct comparison of
stimulus objects. For example, respondents might be
asked whether they prefer Coke or Pepsi.
 Comparative scale data must be interpreted in relative
terms and have only ordinal or rank order properties.
For this reason, comparative scaling is also referred to as
non-metric scaling.
 As shown in Figure 8.2, comparative scales include paired
comparisons, rank order, constant sum scales, Q-sort,
and other procedures.
 Advantage: The major benefit of comparative scaling is
that small differences between objects can be
detected. As they compare the stimulus objects,
respondents are forced to choose between them. In
addition, respondents approach the rating task from the
same known reference points.

 Disadvantage: The major disadvantages of comparative


scales include the ordinal nature of the data and the
inability to generalize beyond the stimulus objects scaled.
For instance, to compare RC Cola to Coke and Pepsi,
the researcher would have to do a new study. These
disadvantages are substantially overcome by the non-
comparative scaling techniques.
A Comparison of Scaling Techniques
 Non-Comparative Scales: also referred to as monadic or
metric scales, each object is scaled independently of the
others in the stimulus set. The resulting data are generally
assumed to be interval or ratio scaled.
 For example, respondents may be asked to evaluate Coke
on a 1-to-6 preference scale (1 not at all preferred, 6
greatly preferred). Similar evaluations would be obtained
for Pepsi and RC Cola.
 Non-comparative scales can be continuous rating or
itemized rating scales. The itemized rating scales can be
further classified as Likert, Semantic differential, or
Stapel scales.
 Noncomparative scaling is the most widely used scaling
technique in marketing research.
Comparative Scaling Techniques
 A) Paired Comparison Scaling: A comparative scaling
technique in which a respondent is presented with two
objects at a time and asked to select one object in the pair
according to some criterion. The data obtained are ordinal
in nature.
Comparative Scaling Techniques
 B) Rank Order Scaling A comparative scaling technique
in which respondents are presented with several objects
simultaneously and asked to order or rank them according
to some criterion.

Source: (Malhotra & Dash, 2014)


Comparative Scaling Techniques
 C) Constant Sum Scaling: A comparative scaling
technique in which respondents are required to allocate
a constant sum of units such as points, dollars, chits,
stickers, or chips among a set of stimulus objects with
respect to some criterion.

Source: (Malhotra & Dash, 2014)


Comparative Scaling Techniques
 Q-sort scaling :

A comparative scaling technique that uses a rank order


procedure to sort objects based on similarity with respect
to some criterion.
Non-Comparative Scaling Techniques

 A) Continuous Rating
Scale: In a continuous
rating scale, also
referred to as a graphic
rating scale,
respondents rate the
objects by placing a
mark at the
appropriate position
on a line that runs
from one extreme of
the criterion variable
to the other. Thus, the
respondents are not
restricted to selecting
from marks previously
set by the researcher.
Non-Comparative Scaling Techniques
 B) Itemized Rating Scales In an itemized rating scale,
the respondents are provided with a scale that has a
number or brief description associated with each
category. The categories are ordered in terms of scale
position, and the respondents are required to select the
specified category that best describes the object being
rated.

 B1) Likert Scale: Named after its developer, Rensis


Likert, the Likert scale is a widely used rating scale that
requires the respondents to indicate the degree of
agreement or disagreement with each of a series of
statements about the stimulus objects.
Likert Scale:

Source: (Malhotra & Dash, 2014)


Non-Comparative
Scaling Techniques
 The semantic differential
is a 7-point rating scale
with endpoints
associated with bipolar
labels that have semantic
meaning.

 In a typical application,
respondents rate objects on
a number of itemized, 7-
point rating scales bounded
at each end by one of two
bipolar adjectives, such as
“cold” and “warm.”

Source: (Malhotra & Dash, 2014)


Non-Comparative Scaling Techniques

 The Stapel scale,: named after


its developer, Jan Stapel, is a
unipolar rating scale with 10
categories numbered from 5
to 5, without a neutral point
(zero). This scale is usually
presented vertically.

 Respondents are asked to


indicate how accurately or
inaccurately each term
describes the object by
selecting an appropriate
numerical response category.

 The higher the number, the


more accurately the term
describes the object, as
shown in the department store
project.

Source: (Malhotra & Dash, 2014)


A concept map for Noncomparative scales
Thank You

You might also like