EDU 303 - Lecture Note

TEST AND MEASUREMENT
EDU 303
S. Y. Tsagem, PhD
Dept Of Educational Foundations, FEES
Usmanu Danfodiyo University Sokoto
July, 2023
Test and Measurement | Tsagem, S. Y. (2023)
Principles of Test Construction

Test construction simply implies a systematic process of assembling test items or the
preparation of a test by drawing and compiling series of questions which constitute the task for the
Testee(s). Test validations on the other hand, is defined by Gronlund (1981) as a procedure for
standardising test items by treating them statistically to remove all source of bias in the process of
making them valid, reliable, objective and usable.
Itsuokor (1986) suggested an outline for an effective test construction. According to him, a
satisfactory sample is most likely to be obtained when test preparation follows a systematic
procedure. He listed the following steps as being useful for this purpose.
a) General statement of instructional objective;
b) Making an outline of the content to be covered;
c) Preparation of a table of specification; and
d) Constructing test items that measure the objectives in the specified table.
By the same token, Okobia (1990) stated that in constructing and validating a test, certain
procedures are followed. As such most psychometricians have come to agree that test preparation
is a procedural and a systematic process. However, construction and validation strategies may vary
from one author to the other depending on the nature of the test. Brown (1976) reports that
Educational Testing Service (ETS) in (1965) suggests a typical sequence of construction and
validation which is stated as follows:
1. Planning the test.
2. Writing the item.
3. Pretesting the items.
4. Preparing the final form.
5. Collecting the reliability and validity evidence.
6. Developing normative and interpretative materials.
Planning the Test
In constructing a test for use, the test constructor has to consider the content of the areas
which the test is designed to cover as well as the objective which the test is designed to achieve
when administered. Ebe (1979:108) states that the firmest basis for the construction of a good
test is a set of explicit specification that depicts the following:
Page 1 of 34
1. Forms of test item to be used;

2. Number of item of each form;
3. Kinds of tasks the items will present
4. Number of tasks of each kind;
5. Area of content to be sampled;
6. Number of item in each area; and
7. Level and distribution item difficulty.
The implications of the above is that in planning a test, the use of test blue print or table of
specification is necessary.
Table of Specification
A table of specification, or a test blue print, is a two dimensional table or chart showing
the test objective and the content to be tested or measured. Test specification of this nature guides
the test constructors with respect to the content of the test and the objectives to be measured. The
prepared two-way grid which is often called a test blue-print helps the test constructor to relate the
content or topic to the objectives already stated in the curriculum if an achievement test is
constructed.
In the table of specification, the test constructors show the number of questions or test items
and the percentage of the items in each topic in the blue-print taking into consideration all the steps
postulated by Ebel (1979) which have been stated above. At this point, items or questions are
drawn following rigidly the plan set in the table of specification. Note that table of specification is
highly essential in ensuring content validity of the test. Consider the following illustration adapted
from Nwana (1979:21).
Table 1: Table of Specification for Geography of Africa
Knowledge Comprehensive Application Analysis Synthesis Evaluation
Contents Total
40% 25% 15% 10% 5% 5%
Political Division
(30%)
24 15 9 6 3 3 60
Ethnic Groups
24 15 9 6 3 3 60
(30%)
Political Features
5%
12 7 5 3 1 2 30
Climatic Zone
(15%)
12 8 4 3 2 1 30
Economic
Geography (10%)
8 5 3 2 1 1 20
Total 80 50 30 20 10 10 200
Source: Nwana( 1979:21)
In the above illustration, some of the calculations were rounded off to the nearest whole numbers
Page 2 of 34
Writing the Items

The blue-print, as stipulated earlier on, helps the test constructor to write the appropriate
items. This could be objective or essay or both depending on what type of test the constructor has
in mind. After writing the items, the items are then edited to ensure that an item has an acceptable
quality. This item editing involves reviewing the test items by the test constructor and testing
specialist (Lien, 1976). This review usually covers the content validity and accuracy, grammatical
construction and possible flaws in the test.
In using multiple-choice items, more items than are required are usually constructed. In
constructing the items, all the necessary steps recommended by Ebel (1979) are taken into
consideration. The number of items required are then selected from the item pool utilizing the
pretest data to identify the item with the kind of flavour that is considered as appropriate by the
educator (Brown, 1976).
Pretesting the Items
After reviewing and editing the items, the test constructor then sets to pretest the edited
item upon sample of representative candidates that is the representative of those candidates for
whom the test is finally designed for. Necessary item statistics computed, on the basis of which
some items are discarded, revised, while others are accepted in their original nature. Where
necessary, further try out and analysis can be carried out to further purify the items involved. The
test items are then assembled into the number of test forms needed in the final test.
At this stage of pretesting, testing condition must be specified. The necessary guidelines
for administering and scoring the test have to be developed. It is also pertinent at this stage to
include both limit for the test taker and the test administrators. The information extracted, from
the pretesting, will guide the test constructor to know the required time to set the exact time limits
for the final test which should be determined by practical considerations of amount of time
available for testing the number of items that will be needed for an adequate level of reliability.
All required instruments like answer sheets, scoring keys and the procedures for enhancing
accurate and reliable scoring must be developed.
Preparing the Final forms

At this stage, the final forms of the test are selected from the results of the item analysis of
the pretested item. The items selected in each of the forms must be proportional to meet necessary
conditions so as to produce test with required characteristics. They should represent the various
Page 3 of 34
skills and content categories as stipulated in the test plan, the number of items must be appropriate
for the time limit, the item difficulty level and discriminating power must be established; a wide
distribution of scores must be obtained, all the aspects of the test must be equated to each other
(Brown, 1976). The items then will receive the final blessing by the test constructor and other
specialists before they are finally administered as representative sample of the students’ population
(Lien, 1976).
Test Manual
There is a consensus among constructors that the final step in test construction and
validation is the preparation of test manual. The test manual according to Okobia (1990) contains
a detailed description of the test, how the test was constructed and its main qualities and
characteristics.
Most test manuals contain information on how the test should be administered. The manual
should also contain direction on how the test should be scored and interpreted as well as the
necessary materials that can aid the test users in using the tests scores or results (lpaye. 1982;
Brown, I983).
Characteristics of a Good Test

Any test that is considered good and designed for use in measuring must meet the following
qualities or characteristics:
1. Validity;
2. Reliability;
3. Objectivity:
4. Usability
Validity
Validity simply means the ability of a test to measure what it is designed to measure. In
other words, the validity of a test is the extent to which a test measures what it is supposed to
measure. Therefore, a test is valid if it is adequate, correct and appropriate in measuring all the
aspects of the designated content to be measured. Validity is no doubt, the most important
characteristics of a good test. It is when all the aspects of the validity of the test have been
established that the other qualities of the test can be made. In fact, the test validity is the basis for
establishing the other properties of the test particularly the reliability of the test.
There are several types of test validity. They include:
Page 4 of 34
a) Face validity;
b) Content validity;
c) Construct validity;
d) Concurrent validity; and
e) Predictive validity.
(a) Content validity: This is concerned with the extent to which test items are the representative
sample of all the possible items that measure the subject matter in any curriculum area (Chase,
1978). In establishing the content validity of a test, the compiler of the test requires the use of the
blueprint or a table of specification and, more importantly, the consensus of expert judgement on
the area that is being tested.
(b) Face Validity: This implies the cosmetics or the physical appearance of the test with respect to
the format for presenting or reporting the test items, the typing and the general outlook of the test.
The evidence efface validity is the recognition of a test in terms of what it seeks to measure, for
example, the ability to recognize a mathematics test because of the presence of numbers, and other
mathematical sign and symbols such that mathematics test cannot be said to be an English test or
a chemistry test.
(c) Construct Validity: A construct parse is a human characteristic that may be psychological or
sociological in nature. Concepts like intelligence, attitude, and aptitude are examples of constructs.
Therefore, construct validity is the extent to which the basic construct in a particular test are
measured. Thus, a test is construct valid, if it truly measures the constructs that are supposed to be
measured.
(d) Concurrent Validity: This is a form of criterion related validity. It shows the extent to which
the scores of the particular test are related to another similar test. The concurrent validity of test is
established by correlating the scores of the test that is being constructed with the scores of an
already standardized test. The higher the co-efficient of correlation, the higher the level of
concurrent validity.
(e) Predictive Validity: A test possesses predictive validity if it is capable of predicting the future
outcome or performance of testees. Matriculation Examination (UME) has predictive validity. For
instance, the University if it is capable of predicting the performance of students who eventually
enter the University such that their UME scores correlate with their academic performance during
their University programme.
Page 5 of 34
Reliability
This is the ability of a test to measure consistently when administered to the same person
or group of persons at different times. To be sure, a test is reliable if it generates the same or related
scores if it is administered on more than one occasion to the same group of persons.
There are different ways of establishing, or determining, the reliability of a test. These
include:
a) Test-retest method;
b) Split-halfmethod;
c) Parallel or equivalent form method;
d) Kuder-Rechardson's method.
In using any of these methods or techniques, efforts are made to determine the co-efficient
of relationship between scores generated at different occasions. A high positive relationship
implies a high reliability. Usually the co-efficient of relationship; ranges between (-1 to 1). While
1 is a perfect negative correlation; co-efficient of +1 is a perfect positive.
(a) Test-retest Method: This method involves administering the test on two separate occasions
after some interval i.e. (Test one (1) and test (2) when this is done, the correlation between test (1)
and test (2) is calculated. On this basis, the test is considered reliable if the computed co-efficient
of reliability is greater than 0.50. Test-retest techniques measure consistency over a period of time.
Apart from the ease of administration and procedure, the method is limited in the area of logistics
due to sample mortality and some task involved in arranging for the second testing. More
significantly, the scores generated in the second testing may not be valid since the testees are
already familiar with the test items which were administered before. This is because the students
have become test wise.
(b) Parallel or equivalent Method: The Constraints of administering the test twice at different
occasions are, no doubt, very cumbersome. Thus, equivalent form method for administering the
test once in different forms. The method involves administering two or more parallel forms of a
test that have been produced in such a way that it seems likely that scores on these alternate forms
will be equivalent. The selected samples in the group will be given both forms of the test and then
the correlation between the scores on the two forms of the test is computed. The calculated
coefficient of reliability provides an estimate of the reliability.
Page 6 of 34
(c) Split-Half Method: In this case, only one test is administered once to a group of testees.
However, in scoring the test the scores of the testees are organized into two halves, the first half
being the total scores of the even number items and the second half being total scores of the odd
number items. Once this is done, the correlation between two set of scores is computed.
The computed correlation co-efficient is subjected to further statistical manipulation. This
is necessary because only one half of the test is involved. Therefore, to obtain the reliability of the
entire test, Spearman-Brown prophecy formula is employed.
2𝑅1
i.e. R, =
1+𝑅1
Where R2 = Reliability of the whole test
R1 = Reliability of half test
1 = Constant
2 = Constant
Supposed R1 = 0.80
2𝑥 0.80
R2 =
1+0.80
1.60
R2 =
1.80
R2 = 0.89
(d) Kuder-Richardson Method: Kuder-Richardson Formula 20, or KR-20, is a measure reliability
for a test with binary variables (i.e. answers that are right or wrong). Reliability refers to how
consistent the results from the test are, or how well the test is actually measuring what you want it
to measure.
The KR20 is used for items that have varying difficulty. For example, some items might
be very easy, others more challenging. It should only be used if there is a correct answer for each
question — it shouldn’t be used for questions with partial credit if possible or for scales like the
Likert Scale. If all questions in your binary test are equally challenging, use the KR-21. If you
have a test with more than two answer possibilities (or opportunities for partial credit), use
Cronbach’s Alpha instead (Stephanie, n.d).
KR-20 Scores
The scores for KR-20 range from 0 to 1, where 0 is no reliability and 1 is perfect reliability. The
closer the score is to 1, the more reliable the test. Just what constitutes an “acceptable” KR-20
score depends on the type of test. In general, a score of above 0.5 is usually considered reasonable.
Apply the following formula once for each item: KR-20 is [n/n-1] * [1-(Σp*q)/Var]
Page 7 of 34
Where:
n = sample size for the test,
Var = variance for the test,
p = proportion of people passing the item,
q = proportion of people failing the item.
Σ = sum up (add up). In other words, multiply each question’s p by q, and then
add them all up. If you have 10 items, you’ll multiply p*q ten times, then you’ll
add those ten items up to get a total.
KR-21
The KR-21 is similar, except it’s used for a test where the items are all about the same difficulty.
The formula is [n/(n-1) * [1-(M*(n-M)/(n*Var))]
Where:
n = sample size,
Var = variance for the test,
M = mean score for the test.
Usability
Usability of a test implies the extent to which a test can be with a minimum expenditure of
time, energy and other resources. In other words, a test is usable if it has the economy of
administration, scoring, time and money in terms of cost space, materials and personnel.
Objectivity
Objectivity in testing implies the absence of human bias. A test is objective if the scores
obtained by individuals from different cultural and geographical background do not differ
significantly. A test is, therefore; objective if it does not favour any person or group of persons on
the basis of race and other factors.
On the whole, a test is considered good if it is valid, reliable, usable and objective.
Item Analysis
Item analysis is the process of examining the students' responses to each test item in order to
judge the quality of the items. It is a statistical technique of reviewing every item on the test with
a view to refining the whole test (Obe, 1980). The technique helps us to not only to identify poor
items, but also to decide why an item is not functioning as one had planned. Items can be analysed
qualitatively, in terms of their content and form, and quantitatively, in terms of their statistical
properties (Anastasi, 1982). Qualitative analysis includes the consideration of content validity, and
the evaluation of items in terms of effective item-writing procedures. Quantitative analysis
includes principally the measurement of item difficulty and item discrimination. Item analysis
helps us to answer such questions as the following for each item:
Page 8 of 34
1. How hard is the item?

2. Does it distinguish between the better and poorer students?
3. Do all the options attract responses, or are there some that are so unattractive that they might
as well not be included?
Three things are considered in item analysis, namely: item difficulty, discriminating ability
of the item and the effectiveness of the distracters.
Item Difficulty
The item difficulty pertains to the easiness of the item. A good item should be neither too
easy nor too difficult. The difficulty index of an item is the proportion of the testees who got the
item right. The difficulty index ranges from zero to one (or from zero to 100%). Items whose
difficulty index ranges from 20 to 80 percent are acceptable. The best difficulty index is 0.5. An
item should not be too difficult that almost all the testees missed it or too easy that every testee got
it right. The formula for item difficulty (P) is:
𝑈+𝐿 100
P = 𝑥
𝑁 1
Where P = the difficulty index for a given item.

U = number of students in the upper group who got the item right.
L = number of students in the lower group who got the item right.
N = number of students in the item analysis group.
Item Discriminating Power

The discriminating power of an item is the extent to which each item distinguishes between
those who scored low or high in the test. It measures how well a test item contributes to separating
the upper and lower group. Item discrimination tells us if an item is showing the differences
between capable and less capable students. A good discriminating item is one in which a greater
number of students who scored highly gets right and few of the students who scored very low gets
right. The discriminating index (D) can take values ranging from -1.00 to +1.00. The higher the D
values the better the item discrimination. Any item that has a D value of +.40 and above is
considered very effective. However, D values that range between +.20 and +.39 are considered
satisfactory. Any item that has negative value should be discarded. The formula for the item
discrimination power is:
Page 9 of 34
𝑈− 𝐿
D = 1⁄ 𝑁
2
Where D = item discriminating power
U = number of students in the upper group who got the item right.
L = number of students in the lower group who got the item right.
N = number of students in the item analysis group.
Effectiveness of Distracters
After identifying the poor test items, such as items that are too easy, too difficult, or those
with zero and negative discrimination, there is need to ascertain what is wrong with these items.
Analysing the effectiveness of distracters entails a comparison of the responses of students in upper
and lower groups. In doing this, the following points should kept in mind:
1. Each distracter should be selected by about equal number of the lower group.
2. Substantially more students in the upper than lower group should respond to the correct
alternative.
3. Substantially more students in the lower group should respond to the distracter. If a distracter
confuses only the upper group, it is likely faulty. If it distracts both groups, replace it. This
could be because of vague or ambiguous question or when the examiner marks the wrong
key.
Steps in Item Analysis

1. Administer the test, score the items and arrange the students’ scores in order of merit (highest
to lowest).
2. Select the item analysis group (N). This is made up of:
(i) The upper group (best 30% or so).
(ii) The lower group (last 30% or so).
3. Beginning with item number one, count how many students in the upper group (U) hat got it
right. Thereafter, count how many students in the lower group (L) that got the item right.
4. Repeat step 3 for other items.
5. For each item, compute the item difficulty.
6. For each item, compute the item discrimination power.
7. Identify the poor items and analyse their item choices or the effectiveness of the distracters.
Page 10 of 34
Example:
Items 1,2,3.4.5....... 48, 49 and 50 on a Mathematics achievement test were passed by the
following number of "upper and "lower" students:
Item Number 1 2 3 4 5 ….. 48 49 50
20 “Upper” Students 18 14 5 20 14 8 12 10
20 “Lower” Students 16 9 2 20 11 16 10 5
Obtain the item difficulty and item discriminating power per item and comment.
Solution:
Item
Item Difficulty Discriminating
Item Upper Lower
Power Remarks
No (U) 20 (U) 20 𝑈 + 𝐿 100 𝑈− 𝐿
𝑃= 𝑥 D= 1⁄ 𝑁
𝑁 1 2
34 100 2 Too easy
1 18 16 𝑥 = 85% = 0.10%
40 1 20 (Reject)
23 100 5 Good
2 14 9 𝑥 = 57.5% = 0.25%
40 1 20 (Accept)
7 100 3 Too difficult
3 5 2 𝑥 = 17.5% = 0.15%
40 1 20 (Reject)
40 100 0 Too easy
4 20 20 𝑥 = 100% = 0%
40 1 20 (Reject)
25 100 3 Good
5 14 11 𝑥 = 62.5% = 0.15%
40 1 20 (Accept)
24 100 −8 Negative Discrimination
48 8 16 𝑥 = 60% = −0.40%
40 1 20 (Reject)
22 100 2 Good
49 12 10 𝑥 = 55% = 0.10%
40 1 20 (Accept)
15 100 5 Fair
50 10 5 𝑥 = 37.5% = 0.25%
40 1 20 (Accept)
Based on the calculations above, items 2, 5, 49 and 50 are satisfactory because their
difficulty level and discriminating power are within acceptable ranges. Items number 3 should be
rejected because it is too difficult. Similarly, items 1 and 4 are too easy and should be rejected.
Even though item 48 have good difficulty level, the discriminating index is negative. It should
therefore be discarded. The examiner should undertake item choice analysis to determine why
items 4 and 48 recorded zero and negative discriminating power respectively. It is either the
questions were vague or the examiner may have used a wrong key in marking the items.
Measurement Instruments
Measuring instruments are psychological tools designed to elicit one form of behaviour or
the other. There are essentially two main categories of measuring instruments
1. Test techniques, and
Page 11 of 34
2. Non-test techniques
Test techniques: They are designed in the form of questions, exercises, puzzles, in which the testee
is required to provide answers to. Again, there are specified answers to these questions or puzzles.
Test techniques are very important in measuring cognitive domain. Example of test techniques
include: Achievement test, intelligence test, and aptitude test.
Non-Test Techniques: On the otherhand, are instruments used in predicting or verifying a
behaviour trait. They are designed in the form of rating scale, checklist questionnaires, etc.
Sometimes, observational techniques and interview are regarded as non-test techniques.
Non-test techniques are mostly used in measuring non-cognitive behaviour domains such
as Affective and Psychomotor domains or behaviours.
Standardized and Teacher-Made Tests

Achievement Test
An achievement test is a test that measures the extent to which a person has "achieved" something,
acquired certain information, or mastered certain skills - usually as a result of planned instruction
or training (Mehrens and Lehmann, 1978). Achievement tests include all kinds of devices used
primarily to assess how well the instructional objectives have been attained. Achievement tests
attempt to measure what an individual has learned, that is, his present level of performance (Best,
1980). Achievement tests are designed to measure the degree of students learning in specific
curriculum areas common to most schools, such as mathematics, English usage, and reading. The
test may be paper-pencil device administered to all children at one time or set of oral questions
administered individually.
Types of Achievement Tests
The Educational Testing Service (Shertzer and Linden, 1979) classified achievement tests into
three types:
1. End of course achievement tests that measure specifically what a student has learned in a
particular subject.
2. General achievement tests that cover a student's learning in a broad field of knowledge and
can be given to students who have taken quite different courses of study within a field.
3. Tests that measure the critical skills a student has learned and his or her ability to use these
skills in solving new problems.
Page 12 of 34
Achievement tests can also be classified according to standardisation (teacher-made or

standardised test) or reference (norm versus criterion referenced test).
Teacher-Made Achievement Test

Teacher-made tests are the tests constructed, administered, and scored by the classroom
teacher, or possibly by a committee or several teachers in the same school. The teacher-made tests
are those constructed by a teacher to measure learning outcomes of students in his classroom. The
items are built around definite specifically objectives taught for in the class. Teacher-made tests
are usually based on the content of the curriculum of a particular course or school. The items in
teacher-made tests are hardly subjected to item analysis. Similarly, the items are seldom analysed
for reliability, but when they are, the reliability coefficients are typically only moderate. However,
Mehrens and Lehmann (1978) reported that teacher-made tests are valuable because they:
1. Assess the extent and degree of student progress with reference to specific classroom
activities.
2. Motivate students.
3. Permit the teacher to ascertain individual pupil strengths and weaknesses while the pupil
is studying a particular subject matter area.
4. Provide information for guidance and counselling purposes.
5. Provide immediate feedback for the teacher, who can then decide whether a particular unit
or concept needs re-teaching and/or whether individual pupils are in need of remedial
instruction.
6. Provide for continuous evaluation (p.161).
Standardised Achievement Tests

A standardised test is one in which the procedure for administration has been established
so that everyone takes the test under the same conditions. Tasks in the standardised achievement
tests are defined and are common for all students across all schools. A test expert for a national
market writes a standardized test. A standardised test is one which has been constructed in accord
with detailed specifications, one for which the items have been selected after tryout for
appropriateness in difficulty and discriminating power, one which is accompanied by a manual
giving definite directions for uniform administration and scoring, and one which is provided with
relevant and dependable norms for score interpretation (Ebel, 1965). Standardised achievement
Page 13 of 34
tests are those tests whose procedure for administration and scoring has been made uniform or
standard. This uniformity makes it possible to compare the scores of students from different parts
of the country (Gibson, 1980; Reily and Lewis 1983). Tasks in the standardised achievement tests
are defined and are common for all students across all schools.
All items in standardised achievement tests are written by professional test developers, and
have been tried out and found to be satisfactory by the criteria of item analysis (Chase, 1978).
There is usually a manual of instruction on administration and scoring. In addition, the test must
have been pre-administered to a sample known as standardisation sample in order to obtain a norm
or standards by which the scores on the test may be interpreted. That is not all, standardised
achievement tests have, almost without exception, substantial reliabilities near, or into, 0.90 range.
Criterion Referenced versus Norm Referenced Tests

The criterion-referenced tests are designed to assess an examinee s mastery of fundamental
skills or knowledge without reference to the performance of others. They are deliberately
constructed to yield measurements that are directly interpretable in terms of specific performance
standards. It measures whether or not an individual has attained the desired or maximum goal in a
learning experience. Norm-referenced tests are tests in which the student's performance is
compared with the performance of others students. It tells where a person stands in a population
who have taken the test.
Uses of Achievement Tests
1. To measure the quantity and quality of students' learning in a subject.

2. To determine the nature of individual differences in a group.
3. To place and classify students according to their ability into groups for instructional
purposes.
4. To aid in the assigning of grades to students after completing a prescribed course of
instruction.
5. To select students into an educational programme.
6. To provide feedback to students about their learning progress. This helps to motivate them
to learn effectively.
7. To diagnose students' learning problems, their strengths and weaknesses.
8. It provides basis for award of prizes, scholarships and certificates.
Page 14 of 34
9. Provides basis for promotion of students to next class level or for retention of students.
10. Provides information needed in making educational and career decisions.
Construction of Essay Tests

Essay tests consist of a list of questions for which the subject (student) is required to write
out the answer (lbanga, 1981). An essay item is a question or situation with instruction, which
requires the testee to organise a complete thought in one or more written sentences (Hopkins and
Antes (1985). The testee is given freedom to generate responses, which must be assessed by a
scorer who is knowledgeable in the subject area. They are best suited for measuring students"
ability to originate and integrate ideas, their depth of knowledge and understanding, verbal
expression, creativity, and higher thought processes. Responses to essay questions may also reflect
students’ attitudes, creativeness, and verbal fluency factors that may or may not be relevant to the
purpose of the testing thus making the test subjective.
Types of Essay Tests
Essay questions are subdivided into two major types - restricted and extended response,
depending on the amount of latitude or freedom given the student to organise his ideas and write
his answer.
Restricted Response Type
The restricted response essay questions are used to measure achievement in content
subjects. It is applicable at the lower primary school level. The amount of restriction in an essay
question depends on the educational level of the testee and the type of information required. In the
restricted response essay question, the student is more limited in the form and scope of his answer.
The teacher directs the pattern as well as limits the volume of response through the wording of the
questions. For example:
1. Describe any FIVE factors responsible for the fall of King Solomon.
2. Using two specific examples, evaluate the effectiveness of O.A.U. in settling disputes among
member nations. Answers will be given in not more than four foolscap pages.
Extended Response Type
In the extended response type of essay question, virtually no bounds are placed on the student as
to the point(s) he will discuss and the type of organisation he will use. It is used to measure students'
written expression, especially in the language arts. It is also used in the higher institutions of
learning. The extended response is open-ended and does not limit the student to the points for
Page 15 of 34
discussion, type of organisation or approach to the proposed task. However, if the testee is left
completely free disregarding the scope, problems could arise during marking. To forestall this,
some restrictions are included. For example, discuss the history of Nigeria from 1960 to 1990.
Advantages of Essay tests
1. Items can be written to allow students maximum freedom to respond.
2. They are useful in testing higher cognitive behaviours such as analysis, synthesis and
evaluation.
3. It requires less time to construct.
4. Gives opportunity to develop writing skills.
5. Minimises guesswork and cheating.
6. Measures writing, expression, memory organisation, problem solving and originality.
7. There is economy of question papers.
Disadvantages of Essay Tests
1. There is poor item sampling, especially in the extended response type of essay test.
2. It is difficult to write essay items with high reliability.
3. The student does not always understand the questions and therefore is not sure of how to
respond.
4. Does not favour students who have difficulty expressing their ideas and thoughts in writing.
5. They are prone to leakage.
6. Test administration often long, boring and anxiety-provoking.
7. Essay questions encourage bluffing.
Guidelines for Constructing Essay Tests
1. Describe the specific objectives, which the questions are to measure.
2. Avoid open-ended questions.
3. Word the questions in such a way that the candidates will interpret them in the same way.
4. Give a clue as to what you expect from the students to make for uniformity of responses.
5. Be mindful of time. Attempt answering the questions you set.
6. The essay items should be of moderate difficulty, not too easy, but difficult enough to pose
a challenge.
7. Gives adequate time and thought to the preparation of essay questions.
8. The questions should be written so that it will elicit the type of behaviour you to measure.
Page 16 of 34
9. Phrase the questions with the action verbs appropriate to the relevant instructional objective
in the cognitive domain.
10. Do not provide optional questions on an essay test.
11. Use a relatively large number of questions requiring short answers rather than just a few
questions involving long answers.
12. Do not start essay questions with such words as list, who, what, and whether. These words
tend to elicit responses that require only a regurgitation of factual information.
13. Adapt the length of the response and the complexity of the question and answer to the
maturity level of the student.
14. Prepare a marking scheme for scoring the answers.
Construction of Objective Tests

Objective tests are tests in which every question is set in such a way as to have only one
right answer. The opinion of the examiner or marker does not come into play in judging whether
an answer is good or bad, acceptable or unacceptable, right or wrong. In other words, there is no
subjective element involved. The items are constructed in a way as to have one, predetermined
correct answer. Objective tests are called objective because similar answers by different testees
are given the same marks, no matter who did the scoring. It can be hand or machine scored. It is
also free of psychological moods, prejudices, whims and caprices of the examiner.
Types of Objective Tests
There are four major types of objective tests: Short-answer, alternate choice, matching, and
multiple-choice items.
Short-Answer Items
The short-answer item (also called the supply answer or completion item) presents a task
in a sentence in which a word, a number, a symbol, or a series of words has been omitted. The
items call for only one response for a blank or a specific series of responses for a series of blanks.
Advantages of Short-Answer Items
1. Short-answer items are useful in mathematics and the sciences, where a computational
answer is required or where a formula or equation is to be written.
2. They are useful in spelling and foreign language evaluations, where specific bits of
information are usually tested.
Page 17 of 34
3. They are used to test for the knowledge of definitions, technical terms, and facts such as
dates, names, places, and vocabulary.
Disadvantages of Short-Answer Items
1. They are limited to questions that can be answered by a word, phrase, symbol, or number.
2. It is almost difficult to write good short-answer items so that one and only one answer will
be correct.
3. Excessive use of short-answer items may encourage rote memory and poor study habits.
4. Scoring can be quite tedious and somewhat subjective.
Guidelines for Writing Short-Answer Items

1. For computational problems, the teacher should specify the degree of precision and the units
of expression expected in the answer.
2. Omit only essential or important words in a sentence.
3. Avoid excessive blanks in a single item. The teacher should not eliminate so many elements
of a statement that the item becomes ambiguous and confusing.
4. The blanks are typically better placed at the end of a statement rather than the beginning.
When the blank is placed at the beginning or middle of the sentence, the essential point to
that question may be overlooked or forgotten by the time the student reads the item.
5. Make all the blanks the same size regardless of the answer.
6. To test for the knowledge of definitions and/or the comprehension of technical terms, use a
direct question in which the term is given and a definition is asked for.
7. Avoid giving irrelevant clues to the correct answer in the structure of the item.
The Alternate Choice Items
In the alternate choice item, the students are given two options to choose one. Such options
include yes-no, true-false, right-wrong, and correct-incorrect.
Advantages of Alternate Choice Items
1. Students are able to respond to more alternate choice items in a given time period.
2. It ensures adequate sample of items when a great deal subject matter must be covered.
3. They are good for young children and pupils who are poor readers.
4. Are particularly suitable for testing beliefs in popular misconceptions and superstitions.
Page 18 of 34
5. When responding, students react in much the same way as they do when answering a question
in class or in a real life situation.
6. They are amenable to item analysis.
7. They are easy to score.
Disadvantages of Alternate Choice Items
1. The pupils' score may be influenced by good or bad luck in guessing.
2. Alternate choice items are highly susceptible to ambiguity and misinterpretation.
3. They lend themselves to cheating.
4. The response to items may be difficult when statements are not absolutely true or false.
Guidelines for Writing Alternate Choice Items

1. Avoid the use of specific determiners, that is, words that serve as special clues to the answer.
For example, items phrased with qualifiers such as never, all, never, and none tend to be
false. Similarly, items phrased with qualifiers such as sometimes, usually, some, and
typically tend to be true.
2. Use simple and clear language. True-false items must be based on statements that are clearly
true or clearly false.
3. Avoid trick questions. Students are under tremendous pressure when they take tests. They
read very rapidly that they ignore "tricky" wording. Moreover, the objective of giving tests
is to measure degree of learning and not whether he can be tricked.
4. Word the item in such a way that superficial knowledge will lead to a wrong answer.
5. Avoid lifting statements verbatim from the text. Very few textbook and statements, when
isolated from context, are completely and absolutely true. Moreover, many of them are of
value chiefly as supporting or clarifying material and are not in themselves highly significant.
6. Avoid making true statements consistently longer than false statements.
7. Have approximately an equal number of true and false statements.
The Matching Items
The matching item presents two lists usually called the premises and responses. The
premises list consists of the questions or problems to be answered, while the responses list contains
the answers. Generally, the two lists have things in common; for example, list of authors and books,
inventions and inventors, historical events and dates, states and capitals, antonyms and synonyms,
Page 19 of 34
words and opposite, etc. The students are directed to match each premise with the corresponding
response.
Advantages of Matching Items
1. It is suitable when one is interested in testing the knowledge of terms, definitions, dates,
events, and other matters involving simple relationships.
2. Large quantity of associated factual material can be measured in a small amount of time.
3. It is amenable to machine scoring.
Disadvantages of Matching Items
1. Matching items mainly measure information that is based on memorisation.
2. It is sometimes difficult to get clusters of questions that are sufficiently alike that a common
set of responses can be used.
Guidelines for Writing Matching Items
1. All parts of a single matching item should be homogenous in content, that is, all should refer
to dates, all to names, all to places, and so on. Be sure the student knows the basis on which
the terms should be matched.
2. Avoid having an equal number of premises and responses.
3. Arrange the numbers in systematic fashion such as alphabetical order, dates and numbers, in
either ascending or descending.
4. Avoid giving extraneous irrelevant clues.
5. Maintain grammatical consistency.
6. Every response in one column should be a every premise in the other plausible answer to
column.
7. All items and options for a given matching exercise should be on a single page.
The Multiple-choice Items
The multiple-choice item consists of a stem and a branch. The stem presents the problem
as either an incomplete statement or a question, while the branch presents a list of suggested
answers (responses or options). There are usually four or five options. Among the options, only
one is the correct answer (or the key). The incorrect options are called distracters. A distracter is a
plausible but wrong answer designed to confuse the student who does not know the correct answer.
From the list of responses provided, the student is required to select the correct one (or best).
Advantages of Multiple Choice Items
Page 20 of 34
1. Provides less chance of guessing the correct option.

2. Multiple-choice tests provide greater test reliability.
3. It minimises student writing and makes scoring easy.
4. Items are amenable to item analysis.
5. Scoring is objective.
6. Measures wide range of objectives from rote knowledge to complex level.
7. Teachers can construct items that require students to discriminate among options that vary
in correctness.
Disadvantages of Multiple Choice Items

1. They are very difficult to construct.
2. Of all objective tests, the multiple choice items require the most time for the students to
respond especially when very fine discriminations have to be made.
3. There is a tendency for teachers to write multiple-choice items demanding only factual recall
Guidelines for Writing Multiple Choice Items
1. Develop a test blueprint. The blueprint specifies the instructional objectives as well as the
content to be covered by the test items.
2. Provide at least four options but not more than five. Less than four options increase the
chance of guessing the correct answer, while more than five options create the problem of
getting plausible distracters.
3. There should be one, and only one, correct response. This alternative should be clearly
correct.
4. The question to be answered must emerge clearly from the stem.
5. All distracters should be plausible and attractive to students who do not know the correct
answer; 'yet, they should be incorrect. Distracters can be common misconceptions, frequent
errors, other plausible but incorrect information.
6. Each item should be independent. One item should not aid in answering another item on the
test.
7. Avoid irrelevant cues to the correct answer provided by response length, repetition of key
words, common associations, or grammar.
Page 21 of 34
8. The position of the correct options in a series should be based on a random pattern. Try to
scatter the position of the correct options.
9. The stem and options must be linked grammatically.
10. Positive rather than negative stems ought to be used.
11. Whenever possible, use new situations and examples. Try to avoid repeating textbook
examples and phraseology.
12. Use none of the above as an option only if there is an absolutely right answer.
13. 1f an item includes controversial material, cite the authority whose opinion is used.
Differences between Essay and Objective Tests

Ebel (1972) identified some significant differences between essay and objective tests.
1. An essay test question requires the student to plan his own answer and to express it in his
own words. An objective test item requires him to choose among several designated
alternatives.
2. An essay test consists of relatively few, more general questions that call for rather extended
answers. An objective test ordinarily consists of many rather specific questions requiring
only brief answers.
3. Students spend most of their time in thinking and writing when taking an essay test. They
spend most of their time reading and thinking when taking an objective test.
4. The quality of an objective test is determined by the skill of the test constructor. The quality
of an essay test is determined largely by the skill of the reader of student answers.
5. An essay examination is relatively easy to prepare but relatively tedious and difficult to score
accurately. A good objective examination is relatively tedious and difficult to prepare but
relatively easy to score accurately.
6. An essay examination affords much freedom for the student to express his individuality in
the answer he gives, and much freedom for the scorer to be guided by his individual
preferences in scoring the answer. An objective examination affords much freedom for the
test constructor to express his express his knowledge and values but gives the student only
the freedom to show, by the proportion of correct answers he gives, how much or how little
he knows or can do.
7. In objective test items the student's task and the basis on which the examiner will judge the
degree to which it has been accomplished are stated more clearly than they are in essay tests.
Page 22 of 34
8. An objective test permits, and occasionally encourages, guessing. An essay test permits, and
occasionally encourages, bluffing.
9. The distribution of numerical scores obtained from an essay test can be controlled to a
considerable degree by the grader; on the other hand, the scores from an objective test are
determined almost entirely by the test.
Aptitude and Non-Cognitive Tests

Obe (1980) and Gibson and Mitchell (1979) define aptitude as a talent or one's potential
capacity to learn and succeed in a given activity, if trained and also as a trait that characterize an
individual's ability to excel in a given area or to acquire the learning necessary for performance in
a given area. Thus, aptitude is a condition or set of characteristics regarded as symptomatic of an
individual's ability to acquire with training some (usually specified) knowledge, skill or set of
response.
Meanwhile, aptitude tests are those tests that measure an individual's potential to achieve
in a given activity or to learn to achieve in that activity (Gibson and Mitchell, 1979). They attempt
to predict the degree of achievement that may be expected from individuals in a particular activity.
The purpose of aptitude testing is to predict how well an individual will perform on some criterion
(such as school grades, teacher's ratings or job performances) before training or instruction is
begun or selection or placement decisions are made.
Characteristics of Aptitude Tests
1. Involves both innate and acquired abilities.
2. Measures non-deliberate or unplanned learning.
3. Measures a person's capacity to excel in the future based performance.
4. Aptitudes are based on present psychological factors known from empirical studies to
account for good performance in the activity under consideration.
Classification of Aptitude Tests
Horrocks (as cited in Shertzer and Linden, 1979) classifies aptitude tests into four types,
namely: differential test, component ability test, analogous test and work-sample aptitude tests.
1. Differential Test: The differential test (also known as multifactor or analytic test) assesses a
number of special abilities that compose one or more aptitudes. An examinee's performance
is measured by a battery of separate tests and analysed by scores either from each component
test or from a single total score.
Page 23 of 34
2. Component Ability Test: This kind of test assesses a single special ability such as space
perception, mechanical accuracy, and spatial relation.
3. Analogous Test: Analogue tests present the basic activities of a job either by duplicating the
pattern of the job in miniature or by simulating the job.
4. Work-Sample Aptitude Test: Work-sample test requires the examinee to perform all or part
of a given job under the conditions that exist on the job. The scoring is based on rate of
improvement and amount of improvement after certain period of practice. Work-sample tests
are used for recruiting and classifying job applicants.
Differences between Achievement and Aptitude Tests

1. Achievement tests measure terminal behaviour or an individual's status on completion of
training. Aptitude tests, on the other hand, are used to predict future performance.
2. Achievement tests measure the effects of relatively standardized sets of experiences, such as
summary writing, algebra, chemistry, etc. In contrast, aptitude test performance reflects the
cumulative influence of multiplicity of experience in daily living.
3. Achievement tests measure the effects of learning that occurred under partially known and
controlled conditions. While aptitude tests measure the effects of learning under relatively
uncontrolled and unknown conditions.
Uses of Aptitude Tests
1. To identify potential abilities of which the individual is not aware.
2. To identify special talents or potential abilities whose development needs to be encouraged
in an individual.
3. To provide information that may assist an individual in making educational and career
decisions or other choices between competing alternatives.
4. Serves as an aid in predicting the level of academic or vocational success an individual might
anticipate.
5. To divide students into relatively homogenous groups for instructional purposes.
6. To identify students who deserve scholarship awards.
7. To screen individuals for particular educational programmes.
8. To help guide individuals into areas they are most likely to succeed.
Page 24 of 34
Non-Cognitive Tests: Attitude Tests

An attitude is often defined as a tendency to react favourably or unfavourably toward a
designated class of stimuli, such as a national or ethnic group, a custom, or an institution (Anastasi
1982). Johnson (1975) defined attitudes as a combination of concepts, information, and emotions
that result in a predisposition to respond favourably or unfavourably towards particular people,
group, ideas, events, or objects. According to Silverman (as cited in Okoli, , 2005), attitudes are
well-established mental set that predisposes a person to evaluate something favourably or
unfavourably. Attitudes are learned and relatively enduring. They cannot be directly observed, but
must be inferred from overt behaviour both verbal and non-verbal.
Characteristics of Attitudes
All attitudes are learned. Attitudes are acquired through direct instruction, by taking on the
attribute of someone a person loves or admires (identification), through personal experiences, and
by adopting social roles such as pupil, teacher, husband, wife, doctor, mechanic, etc. All attitudes
are continually open to modification and change. The learning and modification of attitudes have
their origins in interaction with other people. The interaction can be direct or indirect. through
movies, advertisement, books, or television. The acquisition and modification of attitudes is a
dynamic process. Internal and external psychological forces drive people to acquire and modify
their attitudes. The two forces are in a dynamic state of tension or equilibrium until an
accommodation is reached (Johnson, 1975).
Components of Attitudes
Attitudes have three components. These are feeling, cognitive and action components (Reily
and Lewis, 1983).
1. Feeling Component: This pertains to the emotional reaction or feeling a particular object or
person evokes in an individual. This feeling could be rational or irrational. It influences the
acceptance or rejection of anything connected to the object.
2. Cognitive Component: This refers to the internalized views held by an individual toward an
object, person, or idea.
3. Action Component: This is predisposition to certain types of behaviour.
Ways of Expressing Attitudes
Page 25 of 34
We express attitudes in three ways. Firstly, we exhibit out attitudes through our emotional
reactions especially as it pertains to what we choose to accept or reject. Secondly, we express
attitudes through spontaneous verbal remarks and comments about people, ideas and objects.
Thirdly, we manifest our attitudes through our behaviour.
Importance of Attitudes
1. Attitudes influence our feelings and how we perceive things. It helps to determine what a
person sees and how he sees it.
2. Attitudes have profound effect on school learning. A student's attitudes determine whether
he will perceive learning as pleasant or unpleasant, as important or useless and as colourless
or neutral.
3. Attitudes influence a person's concept of self and personal identifications.
4. Attitudes influence our interpersonal relationships, choice of friends, hobbies, and extra-
curricular activities.
5. Attitudes affect a person's choice of careers and school subjects.
Types of Attitude Measures
Attitude measures are classified according to whether the items disguise the nature and
purpose of the assessment and the extent to which the test is structured or unstructured.
Disguised Measures of Attitudes
An attitude scale is disguised if it contains items that appear harmless to the testee but are
designed to elicit information concerning personal attitudes likely to be withheld by the testee.
Disguised measures of attitudes include the information tests, perception and memory tests and
judgement tests.
Information Tests. On the surface, the information tests appear to measure knowledge, but they
actually measure attitudes. The use of information tests is based on the supposition that individuals
tend to answer cognitive-type items in accordance with their deep-seated and often sub-conscious
beliefs.
Perception and Memory Tests. On the assumption that memory and perception are selective, and
individuals with different attitudes should respond differently to various stimuli, individuals are
presented with a detailed picture containing elements of the attitude to be measured for 2 to 3
seconds. Thereafter they are asked structured or unstructured questions about the activities and
features in the picture. They may also be asked questions whose answer that cannot be derived
Page 26 of 34
from the picture. What the individual sees and remembers are used as measures of his attitude
towards the elements in the picture.
Judgement Tests. Judgement tests require the respondents to make judgements using one set of
criteria that appear to be non-threatening while actually measuring more subtle aspects of attitude.
Non-Disguised Attitude Measures
The non-disguised attitude measures do not hide the purpose of the assessment from the
subjects. They are usually in the form of attitude scales. Attitude scales are self-report inventories
designed to measure the extent to which an individual has favourable or unfavourable feelings
toward some persons, group, objects, institution or idea. They are used where an individual has
little reason for distorting the results. Examples include the social distance, Thurstone and Likert
scales.
Personality Tests
Personality refers to the affective or non-intellectual aspects of behaviour. Webster's Third
New International Dictionary (1983) defined personality as the integrated organisation of all the
psychological, intellectual, emotional, and physical characteristics of an individual, especially as
they as presented to others. Coleman (1960) had defined personality as the individual's unique
pattern of traits that distinguishes him as an individual and accounts for his consistent way of
interaction with his environment. In other words, personality refers to the organised system of
behaviours and values that characterise a given individual and account for his particular manner
of functioning in the environment.
Ways of Assessing Personality

To assess personality we make use of personality tests. Personality tests are instruments
for measuring the affective or non-intellectual aspects of behaviour for personal counselling
(Oladele, 1987). They are used to measure such aspects of personality as emotional stability,
friendliness, motivation, dominance, interests, attitude, leadership, self-concept, sociability, and
introversion-extroversion
In conventional psychometric terminology, personality tests are instruments for the
measurement of emotional, motivational, interpersonal, and attitudinal characteristics, as
distinguished from abilities. In the broad sense, attitudes and interests are aspects of an individual's
personality. According to Anastasi (1982), the strength and direction of the individual's interests,
attitudes, motives, and values represent an important aspect of personality.
Page 27 of 34
The personality tests are used by the counsellor to survey and diagnose personality
characteristics and problems with the aim of giving the client useful information about his
personality. The knowledge of one's personality serves as a vital tool in making critical decisions
such as choice of careers and life partners. Personality tests are of two major types: paper-pencil
self report inventories and projective techniques.
Self-Report Personality Inventories

Self-report inventories consist of questionnaire type statements requiring a limited form of
responses such as might be found on true-false or multiple-choice items.
Constructing Self Report Personality Inventories

Rational Approach: The rational or logical approach begins with the selection of items that appear
to measure some personality trait or traits. It entails a logical sampling of items from some universe
such as problem faced by adolescents. Examples include the Mooney Problem Checklist, Edwards
Personal Preference Scale, Edwards Personality Inventory, Hare's Self-Concept Scale, Piers-Harris
Children's Self-Concept Scale, Tennessee Self-Concept Scale, Students Problem Inventory, etc.
Empirical Approach: The empirical approach to personality assessment requires that items
discriminate among various criterion groups such as normal, neurotics, and psychotics. Examples
include the Minnesota Multiphasic Personality Inventory (MMPI) and the California
Psychological Inventory (CPI).
Factor Analysis Approach: The technique of factor analysis was developed as a means of
identifying psychological traits (Anastasi, 1982). Factor analysis is a statistical technique for
analysing the interrelationships of behaviour data and for construct validation. It is employed as
aid to selecting items and for labelling the factors (traits or personality types). Examples include
Sixteen Personality Factor Questionnaire (16-PFQ), Eysenck Personality Questionnaire (EPQQ),
Junior Eysenck Personality Questionnaire (JEPQ), State-Trait Anxiety Inventory (STAI) and
State-Trait Anger Inventory (STAI).
Projective Techniques
Projective techniques are designed to measure personality dynamics and motivation. They
consist of unstructured stimuli or tasks usually administered individually by experienced
Page 28 of 34
psychologists. The use of projective techniques is predicated on the assumption that when
individuals are faced with ambiguous stimuli, they tend to project or transfer their attitudes, beliefs
and personality traits onto the stimuli (Sax, 1980). Because the purpose of the projective
techniques is ambiguous and disguised, the subject may unconsciously reveal himself as he reacts
to the external object.
Types of Projective Techniques
Association Tests
Association tests require subjects to respond as rapidly as possible to such stimuli as words
or pictures. The subjects are presented with a list of terms that are loaded with emotions or are
neutral and instructed to respond with the first word or idea that comes to mind. The amount of
time it takes the subject to react, the response itself, and any evidence of embarrassment or
hesitation indicate the extent of the subject's inner turmoil. A typical example of the association
test is the Rorschach Inkblot Test.
Rorschach Inkblot Test. Hermann Rorschach, a Swiss psychologist, developed the inkblot test. It
consists of 10 cards. On each card is printed a bilaterally symmetrical inkblot. The cards are shown
one at a time to a subject who is asked to tell what the blot could represent. The examiner keeps
record of the responses to each card, time of response, position or positions the cards are held,
spontaneous remarks, emotional expressions and other behaviours of the subject during the test
session. The subject's responses are believed to be a reflection of his wishes, attitudes, and
conceptions of the world.
Construction Tests
The construction tests require the subjects to tell a story after examining a picture depicting
a scene, a person, or a social situation. The emphasis is not time, but on the subject's theme and
mode of responding. A typical example of the completion test is the Thematic Apperception Test
(TAT)
Thematic Apperception Test. H.A. Murray (Sax, 1980) developed the Thematic Apperception Test
(TAT). It consists of 19 cards each containing vague pictures in black and white plus a blank card.
The subject is asked to construct a story indicating what led to the event shown in the picture, what
is happening now, what the characters are feeling and thinking and what the outcome would be.
For the blank card, the subject is asked to imagine a picture on the card, describe it and make up a
story about it.
Page 29 of 34
During the scoring, the examiner first identifies the "hero" (projection of the subject's
personality), the needs and press or social forces acting on the hero or heroine. Thereafter, the
examiner tries to assess the importance or strength of a particular need or press for the individual.
He pays special attention to the intensity, duration, and frequency of its occurrence in different
stories. It is believed that past and present experiences, conflicts and wishes influence a person's
response to the cards.
Completion Tests
The completion tests consist of incomplete sentence, stories, cartoons, or other stimuli,
which the respondent is to complete. Examples are Madeleine Thomas Completion Stories and
Rosenzweig Picture-Frustration Study.
Rosenzweig Picture-Frustration Study. The test consists of 24 cartoons. Each cartoon depicts an
individual who is creating a frustration or calls attention to a frustrating condition and an individual
who must respond to it. The examiner tries to ascertain where the subject directs the aggression
arising from the frustrating situation. The direction of the aggression could be extra punitive
(outward aggression), intro-punitive (aggression directed toward self), or impunitive (frustration
evaded). It come in separate forms for adults, aged 18 and over: for adolescents, aged 12 to 18;
and for children, aged 4 to 11.
Expressive Tests
The expressive tests allow the respondents an active role in drawing, painting. or playing
as a means of expressing personality traits. Examples include the Draw-a-Person Test by
Goodenough and Harris and psychodrama.
Draw-a-Person Test: In the Draw-a-Person test, the individual is given a paper and pencil, and
told to a draw a person. Artistic excellence is not required, but free hand drawing. After finishing
the first drawing, he or she is requested to draw a person of the opposite sex from that of the first
character. The individual may be asked to makeup a story about each drawing as if he were a
character in a play or novel (Anastasi, 1982). The test is one of the popular objective personality
test, which has great potentials for assessing motivations, and other aspects ofa client's
psychodynamic functioning. It is also used to assess maturation, psychomotor development,
intelligence and deviant behaviour.
Uses of Personality Tests
1. To assess generally common, as well as specific individual traits.
Page 30 of 34
2. To identify psychological disorders in individuals.

3. To identify the job types best suited to individuals.
4. To assess adjustment of individuals in such areas as home life, social life, etc.
5. To determine individuals level of social interaction and acceptance.
6. To determine educational success.
7. To predict interests, hobbies and vocational preferences.
8. To understand actions and reactions of individuals.
Sources of Error in Personality Assessment
The major source of error in personality assessment arises from response set. Response set
is a situation where by the client responds to test items in a rigid manner that does not reflect his
actual disposition. There are various forms of response set. Some of them are explained below.
1. Response Acquiescence. This is tendency to respond positively to all items irrespective of
the appropriateness of some of the responses.
2. Response Desirability. This is desirable tendency to choose socially response alternatives
even when the responses do not reflect the actual feeling of the client.
3. Response Deviation. This is tendency to give unusual or uncommon responses.
4. Faking Good. This is tendency to choose answers that portray one as possessing only positive
and attractive qualities. This is common among applicants being screened for admission to
educational and training programmes and job recruitment.
5. Faking Bad. Tendency to portray oneself as unworthy of certain situations. The individual
responds negatively to those items that make him appear psychologically disturbed. This
may be witnessed, for example, in the psychological assessment of persons arrested for
serious criminal offences.
6. Test Sophistication. This is a condition in which the client becomes very familiar with the
test items and very confident in testing situations because of having taken the same test
repeatedly.
Page 31 of 34
References
Abiodun-Oyebanji, O. J. (2017). Variables: Types, Uses and Definition of Terms. In A. O
Jaiyeoba, A. O. Ayeni & A. I. Atanda (Ed.), Research in Education.
Adegun, J. A. (2005). Variables in Educational Research. In Bandele, S.O., Seweje, R. O. And

Alonge, M. F. (Eds.) Lagos; Premier Publishers.
Adeleke, J.O. (2010). The Basics of Research and Evaluation Tools. Lagos: Somerest Ventures.
Aderounmu, O. & Duyilemi, D. (1988). Element of Educational Research. Lagos; Okanlawon

Publishers.
Anastasi, A. (1982). Psychological Testing. New York: Macmillan Publishing Co., Inc.
Aworh O. C., Babalola J. B., Gbadegesin, A. S., Isiugo-Abanihe, I. M., Oladiran E. G. &
Okunmadewa, F. Y. (2006). Design and Development of Conceptual Framework in Research.
In A. I. Olayinka, V. O. Taiwo, A. Raji-Oyeladi and I. P. Farai, Methodology of Basic and
Applied Research. 2nd. (Eds.). Ibadan: Postgraduate School, UI.
Bandele, S. O. (2004). Educational Researching Perspectives. Ibadan: Niyi Communication and

Printing Ventures.
Best, J. W. & Kahn, J. V. (1986). Research in Education. New Delhi: Prentice Hall of India Private
Limited.
Best, J. W. (1980). Research in Education. Englewoods Cliffs, New Jersey: Prentice-Hall, Inc.
Brown, F. G. (1976). Principles of Education and Psychological Testing. New York: Rinehart and
Winston
Brown, F. G. (1983). Principles of Education and Psychological Testing. London: Oxford

University Press.
Chase, C. I. (1978). Measurement for Educational Evaluation. Boston: Allyn and Bacon.
Cohen, L., Manion, L. & Morrison, K. (2007). Research Methods in Education. 6th Ed. London
and New York: Routledge (Taylor and Francis Group).
Ebel, R. L. (1965). Measuring Educational Achievement. Englewood Cliffs, New Jersey: Prentice-
Hall, Inc.
Ebel, R. L. (1979). Essentials of Educational Measurement. New York: Prentice Hall, Inc.
Egbule, J. F. (2008). Fundamentals of Test, Measurement and Evaluation, 2nd Edition. Lagos –
Nigeria: Havilah Functional Publishers.
Page 32 of 34
Gibson, J. T. (1980). Educational Psychology for the Classroom. Englewood Cliffs, New Jersey:
Prentice-Hall, Inc.
Gibson, R. L. & Mitchell, M. H. (1981). Introduction to Guidance. New York: Macmillan

Publishing Co., Inc.
Gronlund, N. C. (1981). Measurement and Evaluation in Teaching. New York: Macmillan

Publishers.
Itsuokor, D. E. (1986). Essentials of Test and Measurement. Ilorin – Nigeria: Wove and Son Ltd.
Johnson, R. (1975). Attitude Change Methods. In F. H. Kanfer & A. P. Goldstein (Eds.): Helping
People Chenge: A Textbook of Methods. New York: Pergamon Press.
Kalof, L. Dan, A. & Dietz, T. (2008). Essentials of Social Research. Berkshire, England: Open
University.
Karmel, L. J. & Karmel, M. O. (1998). Measurement and Evaluation in Schools. New York:
Macmillan Publishers Co. Inc.
Lien, A. J. (1976). Measurement and Evaluation of Learning. New York: Brownxo Publishers.
lpaye. T. (1982). Continuous Assessment of Schools. Ilorin – Nigeria: Ilorin University Press.
Mehrens, W. A. & Lehmann, I. J. (1978). Measurement and Evaluation in Educational

Psychology. New York: Rinehart, Holt & Winston.
Nwana, O. C. (1979). Educational Measurement for Teachers. Lagos – Nigeria: Thomas-Nelson.
Nwankwo, J. I. & Emunemu, B. O. (2014). Handbook on Research in Education and the Social
Sciences. Ibadan: Giraffe Books.
Obe, E. O. (1980). Educational Testing in West Africa. Lagos: Premier Press & Publishers
Okobia, D. O. (1990). Construction and Validation of Social Studies Achievement Test (SSAT) for
JSS II Students in Selected Secondary Schools in Bendel State. An Unpublished M.Ed Project,
University of Benin, Benin City – Nigeria.
Okoli, C. E (2005). Introduction to Educational and Psychological Measurement. Lagos – Nigeria:

Behenu Press and Publishers.
Reily, R. R. & Lewis, E. C. (1983). Educational Psychology: Applications for Classroom Learning
and Instruction. New York: Macmillan Publishing Co., Inc.
Shertzer, B. & Linden, J. (1979). Fundamentals of Individual Appraisal. Boston: Houghton Mifflin
Company
Page 33 of 34
Stephanie Glen. Kuder-Richardson 20 (KR-20) & 21 (KR-21) From StatisticsHowTo.com:

Elementary Statistics for the rest of us! https://www.statisticshowto.com/kuder-richardson/
Retrieved on 2nd March, 2021.
Uzoagulu, A. E., (1998). Practical Guide to Writing Research Project Reports in Tertiary
Institutions. Enugu: John Jacobs Classic Publisher Ltd.
Page 34 of 34

EDU 303 - Lecture Note

Uploaded by

Copyright:

Available Formats

EDU 303 - Lecture Note

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

EDU 303 - Lecture Note

Uploaded by

Copyright:

Available Formats

TEST AND MEASUREMENT

Principles of Test Construction

1. Forms of test item to be used;

Writing the Items

Preparing the Final forms

Characteristics of a Good Test

1. How hard is the item?

Where P = the difficulty index for a given item.

Item Discriminating Power

Steps in Item Analysis

Standardized and Teacher-Made Tests

Achievement tests can also be classified according to standardisation (teacher-made or

Teacher-Made Achievement Test

Standardised Achievement Tests

Criterion Referenced versus Norm Referenced Tests

Uses of Achievement Tests

1. To measure the quantity and quality of students' learning in a subject.

Construction of Essay Tests

Construction of Objective Tests

Guidelines for Writing Short-Answer Items

Guidelines for Writing Alternate Choice Items

1. Provides less chance of guessing the correct option.

Disadvantages of Multiple Choice Items

Differences between Essay and Objective Tests

Aptitude and Non-Cognitive Tests

Differences between Achievement and Aptitude Tests

Non-Cognitive Tests: Attitude Tests

Ways of Assessing Personality

Self-Report Personality Inventories

Constructing Self Report Personality Inventories

2. To identify psychological disorders in individuals.

Adegun, J. A. (2005). Variables in Educational Research. In Bandele, S.O., Seweje, R. O. And

Aderounmu, O. & Duyilemi, D. (1988). Element of Educational Research. Lagos; Okanlawon

Bandele, S. O. (2004). Educational Researching Perspectives. Ibadan: Niyi Communication and

Brown, F. G. (1983). Principles of Education and Psychological Testing. London: Oxford

Gibson, R. L. & Mitchell, M. H. (1981). Introduction to Guidance. New York: Macmillan

Gronlund, N. C. (1981). Measurement and Evaluation in Teaching. New York: Macmillan

Mehrens, W. A. & Lehmann, I. J. (1978). Measurement and Evaluation in Educational

Nwana, O. C. (1979). Educational Measurement for Teachers. Lagos – Nigeria: Thomas-Nelson.

Okoli, C. E (2005). Introduction to Educational and Psychological Measurement. Lagos – Nigeria:

Stephanie Glen. Kuder-Richardson 20 (KR-20) & 21 (KR-21) From StatisticsHowTo.com:

You might also like