EDU 303 - Lecture Note
EDU 303 - Lecture Note
EDU 303 - Lecture Note
EDU 303
S. Y. Tsagem, PhD
Dept Of Educational Foundations, FEES
Usmanu Danfodiyo University Sokoto
July, 2023
Test and Measurement | Tsagem, S. Y. (2023)
Page 1 of 34
Test and Measurement | Tsagem, S. Y. (2023)
Table of Specification
A table of specification, or a test blue print, is a two dimensional table or chart showing
the test objective and the content to be tested or measured. Test specification of this nature guides
the test constructors with respect to the content of the test and the objectives to be measured. The
prepared two-way grid which is often called a test blue-print helps the test constructor to relate the
content or topic to the objectives already stated in the curriculum if an achievement test is
constructed.
In the table of specification, the test constructors show the number of questions or test items
and the percentage of the items in each topic in the blue-print taking into consideration all the steps
postulated by Ebel (1979) which have been stated above. At this point, items or questions are
drawn following rigidly the plan set in the table of specification. Note that table of specification is
highly essential in ensuring content validity of the test. Consider the following illustration adapted
from Nwana (1979:21).
Table 1: Table of Specification for Geography of Africa
Knowledge Comprehensive Application Analysis Synthesis Evaluation
Contents Total
40% 25% 15% 10% 5% 5%
Political Division
(30%)
24 15 9 6 3 3 60
Ethnic Groups
24 15 9 6 3 3 60
(30%)
Political Features
5%
12 7 5 3 1 2 30
Climatic Zone
(15%)
12 8 4 3 2 1 30
Economic
Geography (10%)
8 5 3 2 1 1 20
Total 80 50 30 20 10 10 200
Source: Nwana( 1979:21)
In the above illustration, some of the calculations were rounded off to the nearest whole numbers
Page 2 of 34
Test and Measurement | Tsagem, S. Y. (2023)
Page 3 of 34
Test and Measurement | Tsagem, S. Y. (2023)
skills and content categories as stipulated in the test plan, the number of items must be appropriate
for the time limit, the item difficulty level and discriminating power must be established; a wide
distribution of scores must be obtained, all the aspects of the test must be equated to each other
(Brown, 1976). The items then will receive the final blessing by the test constructor and other
specialists before they are finally administered as representative sample of the students’ population
(Lien, 1976).
Test Manual
There is a consensus among constructors that the final step in test construction and
validation is the preparation of test manual. The test manual according to Okobia (1990) contains
a detailed description of the test, how the test was constructed and its main qualities and
characteristics.
Most test manuals contain information on how the test should be administered. The manual
should also contain direction on how the test should be scored and interpreted as well as the
necessary materials that can aid the test users in using the tests scores or results (lpaye. 1982;
Brown, I983).
Page 4 of 34
Test and Measurement | Tsagem, S. Y. (2023)
a) Face validity;
b) Content validity;
c) Construct validity;
d) Concurrent validity; and
e) Predictive validity.
(a) Content validity: This is concerned with the extent to which test items are the representative
sample of all the possible items that measure the subject matter in any curriculum area (Chase,
1978). In establishing the content validity of a test, the compiler of the test requires the use of the
blueprint or a table of specification and, more importantly, the consensus of expert judgement on
the area that is being tested.
(b) Face Validity: This implies the cosmetics or the physical appearance of the test with respect to
the format for presenting or reporting the test items, the typing and the general outlook of the test.
The evidence efface validity is the recognition of a test in terms of what it seeks to measure, for
example, the ability to recognize a mathematics test because of the presence of numbers, and other
mathematical sign and symbols such that mathematics test cannot be said to be an English test or
a chemistry test.
(c) Construct Validity: A construct parse is a human characteristic that may be psychological or
sociological in nature. Concepts like intelligence, attitude, and aptitude are examples of constructs.
Therefore, construct validity is the extent to which the basic construct in a particular test are
measured. Thus, a test is construct valid, if it truly measures the constructs that are supposed to be
measured.
(d) Concurrent Validity: This is a form of criterion related validity. It shows the extent to which
the scores of the particular test are related to another similar test. The concurrent validity of test is
established by correlating the scores of the test that is being constructed with the scores of an
already standardized test. The higher the co-efficient of correlation, the higher the level of
concurrent validity.
(e) Predictive Validity: A test possesses predictive validity if it is capable of predicting the future
outcome or performance of testees. Matriculation Examination (UME) has predictive validity. For
instance, the University if it is capable of predicting the performance of students who eventually
enter the University such that their UME scores correlate with their academic performance during
their University programme.
Page 5 of 34
Test and Measurement | Tsagem, S. Y. (2023)
Reliability
This is the ability of a test to measure consistently when administered to the same person
or group of persons at different times. To be sure, a test is reliable if it generates the same or related
scores if it is administered on more than one occasion to the same group of persons.
There are different ways of establishing, or determining, the reliability of a test. These
include:
a) Test-retest method;
b) Split-halfmethod;
c) Parallel or equivalent form method;
d) Kuder-Rechardson's method.
In using any of these methods or techniques, efforts are made to determine the co-efficient
of relationship between scores generated at different occasions. A high positive relationship
implies a high reliability. Usually the co-efficient of relationship; ranges between (-1 to 1). While
1 is a perfect negative correlation; co-efficient of +1 is a perfect positive.
(a) Test-retest Method: This method involves administering the test on two separate occasions
after some interval i.e. (Test one (1) and test (2) when this is done, the correlation between test (1)
and test (2) is calculated. On this basis, the test is considered reliable if the computed co-efficient
of reliability is greater than 0.50. Test-retest techniques measure consistency over a period of time.
Apart from the ease of administration and procedure, the method is limited in the area of logistics
due to sample mortality and some task involved in arranging for the second testing. More
significantly, the scores generated in the second testing may not be valid since the testees are
already familiar with the test items which were administered before. This is because the students
have become test wise.
(b) Parallel or equivalent Method: The Constraints of administering the test twice at different
occasions are, no doubt, very cumbersome. Thus, equivalent form method for administering the
test once in different forms. The method involves administering two or more parallel forms of a
test that have been produced in such a way that it seems likely that scores on these alternate forms
will be equivalent. The selected samples in the group will be given both forms of the test and then
the correlation between the scores on the two forms of the test is computed. The calculated
coefficient of reliability provides an estimate of the reliability.
Page 6 of 34
Test and Measurement | Tsagem, S. Y. (2023)
(c) Split-Half Method: In this case, only one test is administered once to a group of testees.
However, in scoring the test the scores of the testees are organized into two halves, the first half
being the total scores of the even number items and the second half being total scores of the odd
number items. Once this is done, the correlation between two set of scores is computed.
The computed correlation co-efficient is subjected to further statistical manipulation. This
is necessary because only one half of the test is involved. Therefore, to obtain the reliability of the
entire test, Spearman-Brown prophecy formula is employed.
2𝑅1
i.e. R, =
1+𝑅1
Where R2 = Reliability of the whole test
R1 = Reliability of half test
1 = Constant
2 = Constant
Supposed R1 = 0.80
2𝑥 0.80
R2 =
1+0.80
1.60
R2 =
1.80
R2 = 0.89
(d) Kuder-Richardson Method: Kuder-Richardson Formula 20, or KR-20, is a measure reliability
for a test with binary variables (i.e. answers that are right or wrong). Reliability refers to how
consistent the results from the test are, or how well the test is actually measuring what you want it
to measure.
The KR20 is used for items that have varying difficulty. For example, some items might
be very easy, others more challenging. It should only be used if there is a correct answer for each
question — it shouldn’t be used for questions with partial credit if possible or for scales like the
Likert Scale. If all questions in your binary test are equally challenging, use the KR-21. If you
have a test with more than two answer possibilities (or opportunities for partial credit), use
Cronbach’s Alpha instead (Stephanie, n.d).
KR-20 Scores
The scores for KR-20 range from 0 to 1, where 0 is no reliability and 1 is perfect reliability. The
closer the score is to 1, the more reliable the test. Just what constitutes an “acceptable” KR-20
score depends on the type of test. In general, a score of above 0.5 is usually considered reasonable.
Apply the following formula once for each item: KR-20 is [n/n-1] * [1-(Σp*q)/Var]
Page 7 of 34
Test and Measurement | Tsagem, S. Y. (2023)
Where:
n = sample size for the test,
Var = variance for the test,
p = proportion of people passing the item,
q = proportion of people failing the item.
Σ = sum up (add up). In other words, multiply each question’s p by q, and then
add them all up. If you have 10 items, you’ll multiply p*q ten times, then you’ll
add those ten items up to get a total.
KR-21
The KR-21 is similar, except it’s used for a test where the items are all about the same difficulty.
The formula is [n/(n-1) * [1-(M*(n-M)/(n*Var))]
Where:
n = sample size,
Var = variance for the test,
M = mean score for the test.
Usability
Usability of a test implies the extent to which a test can be with a minimum expenditure of
time, energy and other resources. In other words, a test is usable if it has the economy of
administration, scoring, time and money in terms of cost space, materials and personnel.
Objectivity
Objectivity in testing implies the absence of human bias. A test is objective if the scores
obtained by individuals from different cultural and geographical background do not differ
significantly. A test is, therefore; objective if it does not favour any person or group of persons on
the basis of race and other factors.
On the whole, a test is considered good if it is valid, reliable, usable and objective.
Item Analysis
Item analysis is the process of examining the students' responses to each test item in order to
judge the quality of the items. It is a statistical technique of reviewing every item on the test with
a view to refining the whole test (Obe, 1980). The technique helps us to not only to identify poor
items, but also to decide why an item is not functioning as one had planned. Items can be analysed
qualitatively, in terms of their content and form, and quantitatively, in terms of their statistical
properties (Anastasi, 1982). Qualitative analysis includes the consideration of content validity, and
the evaluation of items in terms of effective item-writing procedures. Quantitative analysis
includes principally the measurement of item difficulty and item discrimination. Item analysis
helps us to answer such questions as the following for each item:
Page 8 of 34
Test and Measurement | Tsagem, S. Y. (2023)
Item Difficulty
The item difficulty pertains to the easiness of the item. A good item should be neither too
easy nor too difficult. The difficulty index of an item is the proportion of the testees who got the
item right. The difficulty index ranges from zero to one (or from zero to 100%). Items whose
difficulty index ranges from 20 to 80 percent are acceptable. The best difficulty index is 0.5. An
item should not be too difficult that almost all the testees missed it or too easy that every testee got
it right. The formula for item difficulty (P) is:
𝑈+𝐿 100
P = 𝑥
𝑁 1
Page 9 of 34
Test and Measurement | Tsagem, S. Y. (2023)
𝑈− 𝐿
D = 1⁄ 𝑁
2
Where D = item discriminating power
U = number of students in the upper group who got the item right.
L = number of students in the lower group who got the item right.
N = number of students in the item analysis group.
Effectiveness of Distracters
After identifying the poor test items, such as items that are too easy, too difficult, or those
with zero and negative discrimination, there is need to ascertain what is wrong with these items.
Analysing the effectiveness of distracters entails a comparison of the responses of students in upper
and lower groups. In doing this, the following points should kept in mind:
1. Each distracter should be selected by about equal number of the lower group.
2. Substantially more students in the upper than lower group should respond to the correct
alternative.
3. Substantially more students in the lower group should respond to the distracter. If a distracter
confuses only the upper group, it is likely faulty. If it distracts both groups, replace it. This
could be because of vague or ambiguous question or when the examiner marks the wrong
key.
Page 10 of 34
Test and Measurement | Tsagem, S. Y. (2023)
Example:
Items 1,2,3.4.5....... 48, 49 and 50 on a Mathematics achievement test were passed by the
following number of "upper and "lower" students:
Item Number 1 2 3 4 5 ….. 48 49 50
20 “Upper” Students 18 14 5 20 14 8 12 10
20 “Lower” Students 16 9 2 20 11 16 10 5
Obtain the item difficulty and item discriminating power per item and comment.
Solution:
Item
Item Difficulty Discriminating
Item Upper Lower
Power Remarks
No (U) 20 (U) 20 𝑈 + 𝐿 100 𝑈− 𝐿
𝑃= 𝑥 D= 1⁄ 𝑁
𝑁 1 2
34 100 2 Too easy
1 18 16 𝑥 = 85% = 0.10%
40 1 20 (Reject)
23 100 5 Good
2 14 9 𝑥 = 57.5% = 0.25%
40 1 20 (Accept)
7 100 3 Too difficult
3 5 2 𝑥 = 17.5% = 0.15%
40 1 20 (Reject)
40 100 0 Too easy
4 20 20 𝑥 = 100% = 0%
40 1 20 (Reject)
25 100 3 Good
5 14 11 𝑥 = 62.5% = 0.15%
40 1 20 (Accept)
24 100 −8 Negative Discrimination
48 8 16 𝑥 = 60% = −0.40%
40 1 20 (Reject)
22 100 2 Good
49 12 10 𝑥 = 55% = 0.10%
40 1 20 (Accept)
15 100 5 Fair
50 10 5 𝑥 = 37.5% = 0.25%
40 1 20 (Accept)
Based on the calculations above, items 2, 5, 49 and 50 are satisfactory because their
difficulty level and discriminating power are within acceptable ranges. Items number 3 should be
rejected because it is too difficult. Similarly, items 1 and 4 are too easy and should be rejected.
Even though item 48 have good difficulty level, the discriminating index is negative. It should
therefore be discarded. The examiner should undertake item choice analysis to determine why
items 4 and 48 recorded zero and negative discriminating power respectively. It is either the
questions were vague or the examiner may have used a wrong key in marking the items.
Measurement Instruments
Measuring instruments are psychological tools designed to elicit one form of behaviour or
the other. There are essentially two main categories of measuring instruments
1. Test techniques, and
Page 11 of 34
Test and Measurement | Tsagem, S. Y. (2023)
2. Non-test techniques
Test techniques: They are designed in the form of questions, exercises, puzzles, in which the testee
is required to provide answers to. Again, there are specified answers to these questions or puzzles.
Test techniques are very important in measuring cognitive domain. Example of test techniques
include: Achievement test, intelligence test, and aptitude test.
Non-Test Techniques: On the otherhand, are instruments used in predicting or verifying a
behaviour trait. They are designed in the form of rating scale, checklist questionnaires, etc.
Sometimes, observational techniques and interview are regarded as non-test techniques.
Non-test techniques are mostly used in measuring non-cognitive behaviour domains such
as Affective and Psychomotor domains or behaviours.
Page 12 of 34
Test and Measurement | Tsagem, S. Y. (2023)
Page 13 of 34
Test and Measurement | Tsagem, S. Y. (2023)
tests are those tests whose procedure for administration and scoring has been made uniform or
standard. This uniformity makes it possible to compare the scores of students from different parts
of the country (Gibson, 1980; Reily and Lewis 1983). Tasks in the standardised achievement tests
are defined and are common for all students across all schools.
All items in standardised achievement tests are written by professional test developers, and
have been tried out and found to be satisfactory by the criteria of item analysis (Chase, 1978).
There is usually a manual of instruction on administration and scoring. In addition, the test must
have been pre-administered to a sample known as standardisation sample in order to obtain a norm
or standards by which the scores on the test may be interpreted. That is not all, standardised
achievement tests have, almost without exception, substantial reliabilities near, or into, 0.90 range.
Page 14 of 34
Test and Measurement | Tsagem, S. Y. (2023)
9. Provides basis for promotion of students to next class level or for retention of students.
10. Provides information needed in making educational and career decisions.
discussion, type of organisation or approach to the proposed task. However, if the testee is left
completely free disregarding the scope, problems could arise during marking. To forestall this,
some restrictions are included. For example, discuss the history of Nigeria from 1960 to 1990.
Advantages of Essay tests
1. Items can be written to allow students maximum freedom to respond.
2. They are useful in testing higher cognitive behaviours such as analysis, synthesis and
evaluation.
3. It requires less time to construct.
4. Gives opportunity to develop writing skills.
5. Minimises guesswork and cheating.
6. Measures writing, expression, memory organisation, problem solving and originality.
7. There is economy of question papers.
Disadvantages of Essay Tests
1. There is poor item sampling, especially in the extended response type of essay test.
2. It is difficult to write essay items with high reliability.
3. The student does not always understand the questions and therefore is not sure of how to
respond.
4. Does not favour students who have difficulty expressing their ideas and thoughts in writing.
5. They are prone to leakage.
6. Test administration often long, boring and anxiety-provoking.
7. Essay questions encourage bluffing.
Guidelines for Constructing Essay Tests
1. Describe the specific objectives, which the questions are to measure.
2. Avoid open-ended questions.
3. Word the questions in such a way that the candidates will interpret them in the same way.
4. Give a clue as to what you expect from the students to make for uniformity of responses.
5. Be mindful of time. Attempt answering the questions you set.
6. The essay items should be of moderate difficulty, not too easy, but difficult enough to pose
a challenge.
7. Gives adequate time and thought to the preparation of essay questions.
8. The questions should be written so that it will elicit the type of behaviour you to measure.
Page 16 of 34
Test and Measurement | Tsagem, S. Y. (2023)
9. Phrase the questions with the action verbs appropriate to the relevant instructional objective
in the cognitive domain.
10. Do not provide optional questions on an essay test.
11. Use a relatively large number of questions requiring short answers rather than just a few
questions involving long answers.
12. Do not start essay questions with such words as list, who, what, and whether. These words
tend to elicit responses that require only a regurgitation of factual information.
13. Adapt the length of the response and the complexity of the question and answer to the
maturity level of the student.
14. Prepare a marking scheme for scoring the answers.
Page 17 of 34
Test and Measurement | Tsagem, S. Y. (2023)
3. They are used to test for the knowledge of definitions, technical terms, and facts such as
dates, names, places, and vocabulary.
Disadvantages of Short-Answer Items
1. They are limited to questions that can be answered by a word, phrase, symbol, or number.
2. It is almost difficult to write good short-answer items so that one and only one answer will
be correct.
3. Excessive use of short-answer items may encourage rote memory and poor study habits.
4. Scoring can be quite tedious and somewhat subjective.
Page 18 of 34
Test and Measurement | Tsagem, S. Y. (2023)
5. When responding, students react in much the same way as they do when answering a question
in class or in a real life situation.
6. They are amenable to item analysis.
7. They are easy to score.
Disadvantages of Alternate Choice Items
1. The pupils' score may be influenced by good or bad luck in guessing.
2. Alternate choice items are highly susceptible to ambiguity and misinterpretation.
3. They lend themselves to cheating.
4. The response to items may be difficult when statements are not absolutely true or false.
Page 19 of 34
Test and Measurement | Tsagem, S. Y. (2023)
words and opposite, etc. The students are directed to match each premise with the corresponding
response.
Advantages of Matching Items
1. It is suitable when one is interested in testing the knowledge of terms, definitions, dates,
events, and other matters involving simple relationships.
2. Large quantity of associated factual material can be measured in a small amount of time.
3. It is amenable to machine scoring.
Disadvantages of Matching Items
1. Matching items mainly measure information that is based on memorisation.
2. It is sometimes difficult to get clusters of questions that are sufficiently alike that a common
set of responses can be used.
Guidelines for Writing Matching Items
1. All parts of a single matching item should be homogenous in content, that is, all should refer
to dates, all to names, all to places, and so on. Be sure the student knows the basis on which
the terms should be matched.
2. Avoid having an equal number of premises and responses.
3. Arrange the numbers in systematic fashion such as alphabetical order, dates and numbers, in
either ascending or descending.
4. Avoid giving extraneous irrelevant clues.
5. Maintain grammatical consistency.
6. Every response in one column should be a every premise in the other plausible answer to
column.
7. All items and options for a given matching exercise should be on a single page.
The Multiple-choice Items
The multiple-choice item consists of a stem and a branch. The stem presents the problem
as either an incomplete statement or a question, while the branch presents a list of suggested
answers (responses or options). There are usually four or five options. Among the options, only
one is the correct answer (or the key). The incorrect options are called distracters. A distracter is a
plausible but wrong answer designed to confuse the student who does not know the correct answer.
From the list of responses provided, the student is required to select the correct one (or best).
Advantages of Multiple Choice Items
Page 20 of 34
Test and Measurement | Tsagem, S. Y. (2023)
Page 21 of 34
Test and Measurement | Tsagem, S. Y. (2023)
8. The position of the correct options in a series should be based on a random pattern. Try to
scatter the position of the correct options.
9. The stem and options must be linked grammatically.
10. Positive rather than negative stems ought to be used.
11. Whenever possible, use new situations and examples. Try to avoid repeating textbook
examples and phraseology.
12. Use none of the above as an option only if there is an absolutely right answer.
13. 1f an item includes controversial material, cite the authority whose opinion is used.
Page 22 of 34
Test and Measurement | Tsagem, S. Y. (2023)
8. An objective test permits, and occasionally encourages, guessing. An essay test permits, and
occasionally encourages, bluffing.
9. The distribution of numerical scores obtained from an essay test can be controlled to a
considerable degree by the grader; on the other hand, the scores from an objective test are
determined almost entirely by the test.
Page 23 of 34
Test and Measurement | Tsagem, S. Y. (2023)
2. Component Ability Test: This kind of test assesses a single special ability such as space
perception, mechanical accuracy, and spatial relation.
3. Analogous Test: Analogue tests present the basic activities of a job either by duplicating the
pattern of the job in miniature or by simulating the job.
4. Work-Sample Aptitude Test: Work-sample test requires the examinee to perform all or part
of a given job under the conditions that exist on the job. The scoring is based on rate of
improvement and amount of improvement after certain period of practice. Work-sample tests
are used for recruiting and classifying job applicants.
Page 24 of 34
Test and Measurement | Tsagem, S. Y. (2023)
Page 25 of 34
Test and Measurement | Tsagem, S. Y. (2023)
We express attitudes in three ways. Firstly, we exhibit out attitudes through our emotional
reactions especially as it pertains to what we choose to accept or reject. Secondly, we express
attitudes through spontaneous verbal remarks and comments about people, ideas and objects.
Thirdly, we manifest our attitudes through our behaviour.
Importance of Attitudes
1. Attitudes influence our feelings and how we perceive things. It helps to determine what a
person sees and how he sees it.
2. Attitudes have profound effect on school learning. A student's attitudes determine whether
he will perceive learning as pleasant or unpleasant, as important or useless and as colourless
or neutral.
3. Attitudes influence a person's concept of self and personal identifications.
4. Attitudes influence our interpersonal relationships, choice of friends, hobbies, and extra-
curricular activities.
5. Attitudes affect a person's choice of careers and school subjects.
Types of Attitude Measures
Attitude measures are classified according to whether the items disguise the nature and
purpose of the assessment and the extent to which the test is structured or unstructured.
Disguised Measures of Attitudes
An attitude scale is disguised if it contains items that appear harmless to the testee but are
designed to elicit information concerning personal attitudes likely to be withheld by the testee.
Disguised measures of attitudes include the information tests, perception and memory tests and
judgement tests.
Information Tests. On the surface, the information tests appear to measure knowledge, but they
actually measure attitudes. The use of information tests is based on the supposition that individuals
tend to answer cognitive-type items in accordance with their deep-seated and often sub-conscious
beliefs.
Perception and Memory Tests. On the assumption that memory and perception are selective, and
individuals with different attitudes should respond differently to various stimuli, individuals are
presented with a detailed picture containing elements of the attitude to be measured for 2 to 3
seconds. Thereafter they are asked structured or unstructured questions about the activities and
features in the picture. They may also be asked questions whose answer that cannot be derived
Page 26 of 34
Test and Measurement | Tsagem, S. Y. (2023)
from the picture. What the individual sees and remembers are used as measures of his attitude
towards the elements in the picture.
Judgement Tests. Judgement tests require the respondents to make judgements using one set of
criteria that appear to be non-threatening while actually measuring more subtle aspects of attitude.
Non-Disguised Attitude Measures
The non-disguised attitude measures do not hide the purpose of the assessment from the
subjects. They are usually in the form of attitude scales. Attitude scales are self-report inventories
designed to measure the extent to which an individual has favourable or unfavourable feelings
toward some persons, group, objects, institution or idea. They are used where an individual has
little reason for distorting the results. Examples include the social distance, Thurstone and Likert
scales.
Personality Tests
Personality refers to the affective or non-intellectual aspects of behaviour. Webster's Third
New International Dictionary (1983) defined personality as the integrated organisation of all the
psychological, intellectual, emotional, and physical characteristics of an individual, especially as
they as presented to others. Coleman (1960) had defined personality as the individual's unique
pattern of traits that distinguishes him as an individual and accounts for his consistent way of
interaction with his environment. In other words, personality refers to the organised system of
behaviours and values that characterise a given individual and account for his particular manner
of functioning in the environment.
Page 27 of 34
Test and Measurement | Tsagem, S. Y. (2023)
The personality tests are used by the counsellor to survey and diagnose personality
characteristics and problems with the aim of giving the client useful information about his
personality. The knowledge of one's personality serves as a vital tool in making critical decisions
such as choice of careers and life partners. Personality tests are of two major types: paper-pencil
self report inventories and projective techniques.
Projective Techniques
Projective techniques are designed to measure personality dynamics and motivation. They
consist of unstructured stimuli or tasks usually administered individually by experienced
Page 28 of 34
Test and Measurement | Tsagem, S. Y. (2023)
psychologists. The use of projective techniques is predicated on the assumption that when
individuals are faced with ambiguous stimuli, they tend to project or transfer their attitudes, beliefs
and personality traits onto the stimuli (Sax, 1980). Because the purpose of the projective
techniques is ambiguous and disguised, the subject may unconsciously reveal himself as he reacts
to the external object.
Types of Projective Techniques
Association Tests
Association tests require subjects to respond as rapidly as possible to such stimuli as words
or pictures. The subjects are presented with a list of terms that are loaded with emotions or are
neutral and instructed to respond with the first word or idea that comes to mind. The amount of
time it takes the subject to react, the response itself, and any evidence of embarrassment or
hesitation indicate the extent of the subject's inner turmoil. A typical example of the association
test is the Rorschach Inkblot Test.
Rorschach Inkblot Test. Hermann Rorschach, a Swiss psychologist, developed the inkblot test. It
consists of 10 cards. On each card is printed a bilaterally symmetrical inkblot. The cards are shown
one at a time to a subject who is asked to tell what the blot could represent. The examiner keeps
record of the responses to each card, time of response, position or positions the cards are held,
spontaneous remarks, emotional expressions and other behaviours of the subject during the test
session. The subject's responses are believed to be a reflection of his wishes, attitudes, and
conceptions of the world.
Construction Tests
The construction tests require the subjects to tell a story after examining a picture depicting
a scene, a person, or a social situation. The emphasis is not time, but on the subject's theme and
mode of responding. A typical example of the completion test is the Thematic Apperception Test
(TAT)
Thematic Apperception Test. H.A. Murray (Sax, 1980) developed the Thematic Apperception Test
(TAT). It consists of 19 cards each containing vague pictures in black and white plus a blank card.
The subject is asked to construct a story indicating what led to the event shown in the picture, what
is happening now, what the characters are feeling and thinking and what the outcome would be.
For the blank card, the subject is asked to imagine a picture on the card, describe it and make up a
story about it.
Page 29 of 34
Test and Measurement | Tsagem, S. Y. (2023)
During the scoring, the examiner first identifies the "hero" (projection of the subject's
personality), the needs and press or social forces acting on the hero or heroine. Thereafter, the
examiner tries to assess the importance or strength of a particular need or press for the individual.
He pays special attention to the intensity, duration, and frequency of its occurrence in different
stories. It is believed that past and present experiences, conflicts and wishes influence a person's
response to the cards.
Completion Tests
The completion tests consist of incomplete sentence, stories, cartoons, or other stimuli,
which the respondent is to complete. Examples are Madeleine Thomas Completion Stories and
Rosenzweig Picture-Frustration Study.
Rosenzweig Picture-Frustration Study. The test consists of 24 cartoons. Each cartoon depicts an
individual who is creating a frustration or calls attention to a frustrating condition and an individual
who must respond to it. The examiner tries to ascertain where the subject directs the aggression
arising from the frustrating situation. The direction of the aggression could be extra punitive
(outward aggression), intro-punitive (aggression directed toward self), or impunitive (frustration
evaded). It come in separate forms for adults, aged 18 and over: for adolescents, aged 12 to 18;
and for children, aged 4 to 11.
Expressive Tests
The expressive tests allow the respondents an active role in drawing, painting. or playing
as a means of expressing personality traits. Examples include the Draw-a-Person Test by
Goodenough and Harris and psychodrama.
Draw-a-Person Test: In the Draw-a-Person test, the individual is given a paper and pencil, and
told to a draw a person. Artistic excellence is not required, but free hand drawing. After finishing
the first drawing, he or she is requested to draw a person of the opposite sex from that of the first
character. The individual may be asked to makeup a story about each drawing as if he were a
character in a play or novel (Anastasi, 1982). The test is one of the popular objective personality
test, which has great potentials for assessing motivations, and other aspects ofa client's
psychodynamic functioning. It is also used to assess maturation, psychomotor development,
intelligence and deviant behaviour.
Uses of Personality Tests
1. To assess generally common, as well as specific individual traits.
Page 30 of 34
Test and Measurement | Tsagem, S. Y. (2023)
Page 31 of 34
Test and Measurement | Tsagem, S. Y. (2023)
References
Abiodun-Oyebanji, O. J. (2017). Variables: Types, Uses and Definition of Terms. In A. O
Jaiyeoba, A. O. Ayeni & A. I. Atanda (Ed.), Research in Education.
Adeleke, J.O. (2010). The Basics of Research and Evaluation Tools. Lagos: Somerest Ventures.
Anastasi, A. (1982). Psychological Testing. New York: Macmillan Publishing Co., Inc.
Aworh O. C., Babalola J. B., Gbadegesin, A. S., Isiugo-Abanihe, I. M., Oladiran E. G. &
Okunmadewa, F. Y. (2006). Design and Development of Conceptual Framework in Research.
In A. I. Olayinka, V. O. Taiwo, A. Raji-Oyeladi and I. P. Farai, Methodology of Basic and
Applied Research. 2nd. (Eds.). Ibadan: Postgraduate School, UI.
Best, J. W. & Kahn, J. V. (1986). Research in Education. New Delhi: Prentice Hall of India Private
Limited.
Best, J. W. (1980). Research in Education. Englewoods Cliffs, New Jersey: Prentice-Hall, Inc.
Brown, F. G. (1976). Principles of Education and Psychological Testing. New York: Rinehart and
Winston
Chase, C. I. (1978). Measurement for Educational Evaluation. Boston: Allyn and Bacon.
Cohen, L., Manion, L. & Morrison, K. (2007). Research Methods in Education. 6th Ed. London
and New York: Routledge (Taylor and Francis Group).
Ebel, R. L. (1965). Measuring Educational Achievement. Englewood Cliffs, New Jersey: Prentice-
Hall, Inc.
Ebel, R. L. (1979). Essentials of Educational Measurement. New York: Prentice Hall, Inc.
Egbule, J. F. (2008). Fundamentals of Test, Measurement and Evaluation, 2nd Edition. Lagos –
Nigeria: Havilah Functional Publishers.
Page 32 of 34
Test and Measurement | Tsagem, S. Y. (2023)
Gibson, J. T. (1980). Educational Psychology for the Classroom. Englewood Cliffs, New Jersey:
Prentice-Hall, Inc.
Itsuokor, D. E. (1986). Essentials of Test and Measurement. Ilorin – Nigeria: Wove and Son Ltd.
Johnson, R. (1975). Attitude Change Methods. In F. H. Kanfer & A. P. Goldstein (Eds.): Helping
People Chenge: A Textbook of Methods. New York: Pergamon Press.
Kalof, L. Dan, A. & Dietz, T. (2008). Essentials of Social Research. Berkshire, England: Open
University.
Karmel, L. J. & Karmel, M. O. (1998). Measurement and Evaluation in Schools. New York:
Macmillan Publishers Co. Inc.
Lien, A. J. (1976). Measurement and Evaluation of Learning. New York: Brownxo Publishers.
lpaye. T. (1982). Continuous Assessment of Schools. Ilorin – Nigeria: Ilorin University Press.
Nwankwo, J. I. & Emunemu, B. O. (2014). Handbook on Research in Education and the Social
Sciences. Ibadan: Giraffe Books.
Obe, E. O. (1980). Educational Testing in West Africa. Lagos: Premier Press & Publishers
Okobia, D. O. (1990). Construction and Validation of Social Studies Achievement Test (SSAT) for
JSS II Students in Selected Secondary Schools in Bendel State. An Unpublished M.Ed Project,
University of Benin, Benin City – Nigeria.
Reily, R. R. & Lewis, E. C. (1983). Educational Psychology: Applications for Classroom Learning
and Instruction. New York: Macmillan Publishing Co., Inc.
Shertzer, B. & Linden, J. (1979). Fundamentals of Individual Appraisal. Boston: Houghton Mifflin
Company
Page 33 of 34
Test and Measurement | Tsagem, S. Y. (2023)
Uzoagulu, A. E., (1998). Practical Guide to Writing Research Project Reports in Tertiary
Institutions. Enugu: John Jacobs Classic Publisher Ltd.
Page 34 of 34