Statistics in Behavioural Sciences
Statistics in Behavioural Sciences
Statistics in Behavioural Sciences
Sciences
Course Code –PHA314B
Faculty Name – Dr Heenakshi Bhansali
Test Construction
COURSE DESCRIPTION
2
Test Construction
• Test construction is the set of activities involved in developing and
evaluating a test of some psychological function.
3
Test Construction
• The items are pretested and selected on the basis of difficulty value,
discrimination power, and relationship to clearly defined objectives
in behavioural terms.
• Practical Criteria
• Technical Criteria
6
Practical Criteria Technical Criteria
Time Reliability
Cast Objectivity
Purpose Standardization
Acceptability Items
7
8
Purpose of Good Test
• Compare two individual or more than two person on the same traits.
9
Steps for Test Construction
10
Steps for Test Construction
11
Planning the test (Objectivity)
• The purpose of the plan is to present the rationale for the test and to
guide the preparation and evaluation of items to be used in the test.
The purpose also serves as a general guideline for a potential user as
he considers the quality of a test and how well the purpose was
achieved.
12
Planning the test (Objectivity)
The test developer should plan well in advance and have a clear idea
about
13
Preparing the test (Item Selection)
• After the preparation of a test plan, the next step is writing and item
evaluation. Item writing is the preparation of the test itself. Items to
be included in a test will depend upon the purpose for which the test
is constructed. If poor items are prepared or if the items are not
related to test purpose, we can not meet the test objectives.
The test developer should have:
• thorough knowledge of subject matter
• familiar with different types of items along with their advantages
and disadvantages
• must have large vocabulary, i.e. he/she should know different
meanings of a word
14
Preparing the test (Item Selection)
In item writing the following suggestions are taken into
consideration:
15
Preparing the test (Item Selection)
16
Preparing the test (Item Selection)
Item evaluation is the process of judging the adequacy of test items to
fulfill the designated purpose of a test.
18
Establishment of reliability and validity
• This is the most important part of test standardization. Because it is
only after this process the test will be eligible for use. In addition to
validity, it is essential that every test should possess definite element
of reliability. It is only then that the conclusions of the test can be
considered reliable and worthy of trust.
• Along with this, the norms are also to be developed.
19
20
Establishment of reliability and validity
Reliability of the test:
• Reliability of a test refers to the quality of the test that may inspire
confidence and trust for the measurement. And this quality can be
attributed to only that test which provides the same score every time
it is performed on the same individual.
• The term reliability basically refers to the extent to which a test can
be relied upon, i.e. it gives consistency in scores even if it is tested on
the same group after frequent intervals/ time gap.
• If any test yields one score for an individual at one time, and
another at the same individual if it is applied at a different time, it is
too evident that such a test cannot be considered reliable.
21
Establishment of reliability and validity
• The reliability of a test is not a part of it, but is in its wholeness or
completeness. it is essential that the internal parts of a test possess
internal consistency and uniformity.
• When test is finally composed, the final test is again administered on
a fresh sample in order to compute the reliability coefficient. This
time also sample should not be less than 100.
• Reliability is calculated through Test-retest method, split-half
method and the equivalent -form method. Reliability shows the
consistency of test scores.
22
Establishment of reliability and validity
Measuring Reliability: Reliability can be measured in the following
four ways.
• Validity refers what the test measures and how well it measures. If a
test measures a trait that it intends to measure well then the test can
be said valid one. It is correlation of test with some outside
independent criterion.
30
Norms
• Therefore, the cumulative total of a psychological test is generally
inferred through referring to the norms that depict the score of the
standardized sample. Norms are factually demonstrated by
establishing the performance of individuals from a specific group in
a test.
• To determine accurately a subject’s (individual’s) position with
respect to the standard sample, the raw score is transformed into a
relative measure.
• For example, the average IQ when using a standardized intelligence
test is about 90. This means that people typically or normally score
at or near 90 on this particular test. Scores that are significantly
lower or higher are considered atypical or much less common.
31
Norms
• Norms have been defined as the standard, performance of a group
of pupils in a test. there is difference between norm and standard.
Norms indicate the actual achievement of individual at standardized
level, while standard indicate the desired level of performance.
• Norms are averages or values determined by actual measurement of
a group of persons who are the representatives of specific
population. While standard possesses a desirable objective, which
can be less or more than the obtained norm.
• To prepare norms, administer a test on a large population in order to
change the scores into percentiles or standard scores. Norms are
used as the central tendency of scores of a certain group.
32
Characteristics of Norms
Novelty: By being novel or up-to-date is meant that the norms should
not be outdated i.e. norms should not be constructed on the basis of
test scores which were administered a long way back because time
interval can effect change in students’ abilities.
34
Types of Norms
• Norm is used for comparison of an individual’s performance to the
most closely related groups’ performance. They carry a clear and
well defined quantitative meaning which can be applied to most
statistical analysis.
• Norms can be classified in four types of norms i.e. Age norms, Grade
norms, Percentile norms and Standard score norms.
Age Norms:
• Assumption in this type of norms is that variable should increase
with the age. To obtain this, take the mean raw score gathered from
all in the common age group inside a standardized sample. By age
norms is meant the average scores of individuals of different ages.
Age norms find out only those variables, which increase with age,
such as height, weight, intelligence, reading ability, vocabulary,
mental age etc. Hence, the 15 year norm would be represented and
be applicable by the mean raw score of students aged 15 years. 35
Types of Norms
Grade Norms:
• Grade norms are related with class. They are also called class norms. By
grade norms in a test is meant the average scores of students of different
classes. This is administered on a classified student in the school. It is
calculated by finding the mean raw score earned by students in a specific
grade. Grade norms are mostly established for achievement tests. These
are related with the performance of average students of all classes.
Percentiles (P(n) and PR):
• They refer to the percentage of people in a standardized sample that are
below a certain set of score. In calculating percentile norm, a candidate is
compared with the group of which he/she is a member. They depict an
individual’s position with respect to the sample. Here the counting begins
from bottom, so the higher the percentile the better the rank. For example,
75th percentile norm tells that 75% students have scored below this score
and only 25% students have obtained scores above it.
36
Types of Norms
Standard Score:
• It signifies the gap between the individuals score and the mean
depicted as standard deviation of the distribution. By standard score
norms are meant to change the raw scores of candidates into
standard scores. This type of norms is found out with the help of
standard deviation (S.D. or σ).
• This standard deviation is a measurement of the expanse of scores of
a group. It can be derived by linear or nonlinear transformation of
the original raw scores. They are also known as T and Z scores.
• Standard Score can be used to compare raw scores that are taken
from different tests especially when the data are at the interval level
of measurement.
37
Item Writing
Item writing refers to the process of constructing the items required for a
given test or assessment. Item writing is essentially a creative art. There are
no set rules to guide and guarantee the writing of good items. A lot depends
upon the item writer’s intuition, imagination, experience, practice, and
ingenuity.
A good test item must have the following characteristics:
Clarity in meaning
Clarity in reading
Discriminating power
Not too easy or too difficult
Doesn't encourage guesswork
Gets to the point
38
Item Writing
To make sure that his test items meet these standards, follows some
general guidelines for item writing:
41
Item Writing Guidelines
Avoid Opinion-Based Items
• Never ask "What would you ... do", " ... use", " ... try", etc. The
examinee's answer can never be wrong.
• Use caution when asking for the "best" thing, or the "best" way of
doing something, unless it is clearly the best amongst the options.
• If differences exist between any experts' opinion about what the
"best" is, then avoid using it.
• Qualify the standard for "best" (i.e., according to ... ).
42
Item Writing Guidelines
A void Absolute Modifiers such as always, never, only and none.
• The use of absolute modifiers in options makes it easy to eliminate options,
increasing the guessing probability.
43
Item Writing Guidelines
Avoiding Excessive Verbiage
• "Verbosity is an enemy to clarity."
• Wordy items take longer to read and answer, meaning fewer items
can be presented in a fixed amount of time, reducing reliability.
• Write items as briefly as possible without compromising the
construct and cognitive demand required.
• Get to the point in the stem and present clean, clear options for the
examinee to choose.
• Avoid unnecessary background information.
44
Item Writing Guidelines
Keep Items Independent
• Content should be independent from item to item.
• Don't give the answer away to one item in the stem of another.
• Don't make answering one item correctly dependent on knowing the
correct answer to another item.
46
Item Analysis
• Item analysis is a general term for a set of methods used to evaluate
test items. Items can be analyzed qualitatively in terms of their
content and form and quantitatively in terms of their statistical
properties.
• The main purpose of this experimental try out is to find out the
weakness, ambiguity and inadequacies of items.
47
Item Analysis
Pre-Try Out
48
Item Analysis
Proper Try Out (Item Analysis)
Item analysis is the technique of selecting discriminating items for the
final composition of the test. The sample size for Proper try out should
be 400. Sample needs to similar to those for whom the test is intended.
It aims at obtaining three kinds of information regarding the items.
1) Item Difficulty
2) Discriminatory power of items
3) Effectiveness of distractors
49
Importance of Item Analysis
50
Methods of Item Analysis
There are two basic methods of item such as Item difficulty and item
discrimination.
Item Difficulty index
• Items that are too easy or too difficult do not affect the variability of
test scores, they contribute nothing to the reliability or validity of the
test.
• The closer the difficulty of an item approaches 1.00 or 0, the less
differential information about test-takers it contributes, conversely
the closer the difficulty level approaches .50 the more
differentiations the items can make.
• Items within a test tend to be inter-correlated, the more
homogeneous the test, the higher these intercorrelations will be.
Moreover, the higher the item inter-correlations, the wider should
be the spread of item difficulty.
52
Item Difficulty index
55
Item Analysis
Final Try Out
• The items are selected after item analysis and constitute the test in
the final form. It is carried out to determine the minor defects that
may not have been detected by the first two preliminary
administrations.
57
Scale Construction
• A measurement scale is used to qualify or quantify data variables in
statistics.
59
Characteristic of Scale
60
Types of Scale
61
Types of Scale
63
Types of Scale