Unit 4
Unit 4
Unit 4
Formative Assessment
Formative Assessment is part of the instructional process. It is carried out at regular intervals
in the course of teaching and is intended to support and improve teaching and learning. It can
be seen as an ongoing diagnostic tool that enables the teacher to adjust their teaching plans in
the light of what students have (and have not) learned so far. Formative assessment is usually
informal and low-key to the extent that students may hardly realize that they are being
assessed. The assessment may be carried out on groups of students, with the teacher
observing what they are able to do. Consequently the outcomes of formative assessment may
be generalized rather than specific, helping the teacher to decide what still needs to be taught:
Formative assessment is likely to place greater emphasis on validity than reliability, since the
outcomes are generally ‘low-stakes’ and not intended to grade or rank those being assessed.
Summative assessment
Informal Assessment
This is a quick and casual way of finding out about the pupil’s performance. It gives a general
picture of their achievement, attitude, character and aptitude it does not usually involve pencil
and paper tests. It includes observation, interviews, work sample analysis, assignments/
projects and quizzes. The main target of informal assessment is to assess and evaluate the
performance of the learners along with their practical skills by avoiding the use of the
standardized tests and scoring patterns which are officially in practice. As a result, you will
never find any standardized tool in order to calculate or evaluate the performances of the
students in the informal assessment tools. For the sake of performing the informal
assessments, various kinds of projects, experiments and presentations can be established for
the students whether in the classrooms or any other platform. The student who the teacher has
chosen is asked some question that the student has to answer in front of the whole class. It
also takes the form of observation in and out of the classroom.
iii) Achievement tests:
These measures the degree of student’s learning in specific curriculum areas in which
instruction has been given. They measure the extent to which students have acquired
certain information or have mastered the required skills. They measure the present
level of knowledge, skills and competence. Usually achievement tests are conducted
at the end of a curse of study, a term or a semester
It is a performance appraisal test that measures the extent to which a trainee has
acquired certain information, or has mastered the required skills. An achievement test
is a test of developed skill or knowledge. The most common type of achievement test
is a standardized test developed to measure skills and knowledge learned in a given
grade level, usually through planned instruction, such as training or classroom
instruction.
Definition
This means that the final grade awarded a student at the end of the term or year is an
accumulation of all the attainments throughout the term or year. Any decision on the student
is based on all the scores obtained in all measurements during the period under review.
A student for example might be declared to have attained the acceptable level of mastery in
social studies to warrant a promotion. This means that the sum total of all his scores in class
assignments, homework, weekly tests, mid-term tests, end of tests, class discussions and
projects, reached the desired level of competence. The decision does not ensure on only one
score in an end-of-year examination. Any decision made on the student therefore considers
all the previous decisions made about him.
Opportunities are provided for the assessment of the total personally of the student. This
involves the assessment of tasks, activities and outcomes and demonstrated in the cognitive
(knowledge), affective (attitude) and psychomotor (skill) domains. It must however be
emphasized that these three domains must not always be included in the programme before
the process is described as continuous assessment.
In addition many types of evaluation procedures are used. These include teacher-made tests,
classroom observations, class assignments and projects, oral questions standardized tests,
interviews and autobiographies. The score obtained from using all these procedures are
combined to arrive at a final grade or classification of the student.
Continuous assessment allows for immediate and constant feedback to be provided to the
student on his performance. The student, often with the help of the teacher and school
counsellor, can analyse the feedback results. On the basis of the information derived for such
an analysis, various strategies are adopted.
Guidance aims at helping the individual to accept his/her ‘worth’. He/she identifies and
accepts strengths and weaknesses. He/she works hard to consolidate the strong areas and
improve upon the weak areas. Continuous assessment aims at playing this guidance role. As
the student is actively involves in the teaching – learning activities and tasks, his areas of
weakness and strengths are easily identified early, and form time to time. The teacher then
helps the student to strengthen further his strong areas and attempts to improve upon his
weaknesses, to attain the level of mastery needed. The student is thus directed and motivated
in his learning.
Learning and taking tests is not an end in ‘itself’. It is a step to achieve the total growth and
development. It also provides the necessary information for the student to decide his future
career and his world of work.
A decision needs to be taken on the dates and period on which the various measurements will
be made. It is also important to include in the plan the types of instruments to be use. These
could be teacher-made tests, standardized tests; interview schedules questionnaires, project
sheets, observation schedules and checklists. Specific times should also be stated for the
filling in the scores that students obtain on the appropriate forms.
Advantages
Disadvantages
Even though continuous assessment achieves much in terms of students and teacher
evaluation of the instructional process and product, there are problems and weaknesses.
1) Give class assignment/exercises fortnightly and record the scores of four of them with
a maximum score of 10 each,
2) Conduct three class tests in a term with a subtotal 40,
3) Give pupils at least four projects/homework in a term with a subtotal of 20
The three assessments give a total score of 100, which scaled down to 30% as the internal
mark for each pupil.
At the end of the junior and senior secondary schools, all the scores a pupil obtains are scaled
to 30% and forwarded to the WAEC where 70% is obtained for external assessment.
For the policy to be successful, teachers are expected to perform the following roles.
It involves the teacher from the beginning to the end: from planning the assessment
programme, to identifying and/or developing appropriate assessment tasks right
through to making the assessment judgments.
It allows for the collection of a number of samples of student performance over a
period of time
It can be adapted and modified by the teacher to match the teaching and learning
goals of the particular class and students being assessed.
It is carried out in ordinary classrooms.
It is conducted by the students’ own teacher.
It involve students more actively in the assessment process, especially if self and/or
peer assessment is used in conjunction with teacher assessment.
It allows the teacher to give immediate and constructive feedback to students.
It stimulates continuous evaluation and adjustment of the teaching and learning
programme.
It complements other forms of assessment, including external examinations.
The marks for SBA should together constitute the School Based Assessment component
marked out of 60 percent. The emphasis is to improve students’ learning by encouraging
them to perform at a higher level. The SBA will hence consist of:
End-of-month tests
Home work assignments (specially designed for SBA)
Project
The SBA system will consist of 12 assessments a year instead of the 13 assessments in the
previous continuous assessment system. This will mean a reduction by 64% of the work load
compared to the previous continuous system. The 12 assessment are labelled as Task 1, Task
2, Task 3 and Task 4. Task 1 – 4 will be administered in Term 1, Task 5 – 8 will be
administered in Term 2, Tasks 9 – 12 administered in Term 3.
Task 1 will be administered as an individual test coming at the end of the first month of the
term. The equivalent of Task 1 will be administered in Term 2 and Term 3 respectively. Task
2 will be administered as a Group Exercise and will consist of two or three instructional
objectives that the teacher considers difficult to teach and learn. The selected objectives could
also be those objectives considered very important and which therefore need pupils to put in
more practice. Task 2 will be administered at the end of the second month in the term. Task
3 will also be administered as individual test under the supervision of the class teacher at the
end of the 11th or 12 week of the term.
Task 4 (and also Task 8 and Task 12) will be a project to be undertaken throughout the term
and submitted at the end of the term. Schools will be applied with 9 project topics divided
into three topics for each term. A pupil is expected to select one project topic for each term.
Projects for the second term will be undertaken by teams of pupils as Group Projects. Projects
are intended to encourage pupils to apply knowledge and skills acquired in the term to write
an analytic or investigative paper, write a poem (as may be required in English and Ghanaian
Language), use science and mathematics to solve a problem or produce a physical three-
dimensional product as may be required in Creative Arts and in Natural Science.
Apart from the SBA, teachers are expected to use class exercises and home work as processes
for continually evaluating pupils’ class performance, and as a means for encouraging
improvements in learning performance.
The end-of term examination is a summative assessment system and should consist of a
sample of the knowledge and skills pupils have acquired in the term. The end-of-term test for
Term 3 should be composed of items/questions based on the specific objectives studied over
the three terms, using different proportions. For example, a teacher may build an end of term
3 test in such a way that if would consist of the 20% of the objectives studied in Term 1, 20%
of the objectives studied in Term 2, and 60% of the objectives studied in Term 3.
The new SBA system is important for raising pupils’ school performance. For this reason, the
60 marks for the SBA will be scaled to 50 in schools. The total marks for the end of term test
will also be scaled to 50 before adding the SBA marks and end-of-term examination marks to
determine pupils’ end of term results. The SBA and the end-of-term test marks will hence be
combined in equal proportions of 50-50. The equal proportions will affect only assessment in
the school system. It will not affect the SBA mark proportion of 30% used by WAEC for
determining examination results at the BECE.
Grading Procedure
The grading system presented above shows the letter grade system and equivalent grade
boundaries. In assigning grades to pupils’ test results, or any form of evaluation, you may
apply the above grade boundaries and the descriptions. The descriptors (Excellent, very Good
etc) indicate the meaning of each grade. For instance, the grade boundary for “Excellent ”
consists of scores ranging between ‘80% to 100%’. Writing “80%’ for instance, without
writing the meaning of the grade, or the descriptor of the grade i.e. “Excellent”, does not
provide the pupil with enough information to evaluate his/her performance in the assessment.
You therefore have to write the meaning of the grade alongside the score you write. Apart
from the score and the grade descriptor, it will be important also to write a short diagnoses of
the points the pupils should consider in order to do better in future tests etc.
Note that, the grade boundaries above are also referred to as grade cut-off scores. When you
adopt a fixed cut-off score grading system as in this example, you are using the criterion-
referenced grading system. By this system a pupil must make a specified score to earn the
appropriate grade. This system of grading challenges pupils to study harder to earn better
grades. It is hence very useful for achievement testing and grading.
Has improved
Could do better
Hardworking
Test Reliability
Definition
Points to note when applying the concept of reliability to testing and assessment.
i. Reliability refers to the results obtained with an assessment instrument and not to
the instrument itself.
ii. An estimate of reliability refers to a particular type of consistency.
iii. Reliability is a necessary condition but not a sufficient condition for validity.
iv. Reliability is primarily statistical. It is determined by the reliability coefficient,
which is degree of relationship between two sets of scores intended to be
measures of the characteristics. It ranges from 0.0 – 1.0
Definition of terms.
X=T+E
Reliability can be defined theoretically as the ratio of true score variance to the observed
score variance.
1. It estimates the amount that a student is likely to deviate from her/his true score. E.g.
SEM = 4 indicates that a student’s obtained scores lies 4 points above or below the
true score. An obtained score of 75 means the true score is either 71 or 79. The true
score therefore lies between 71 and 79 therefore provides a confidence band for
interpreting an obtained score. A small standard error of measurement indicates high
reliability providing greater confidence that the obtained score is near the true score.
2. In interpreting the scores of two students, if the ends of the bands do overlaps as in
Example 1, then there is no real difference between the scores.
1. Inter – Rater Reliability: This type of reliability is to have two or more person’s
scores or rate each student’s paper. The set of scores of the students (One score for
scorer) are then correlated. The resulting correlation coefficients are known as scorer
reliability or inter- rater reliability.
1. Test length: Longer tests give more reliable scores. A test consisting of 40 items will
give a more reliable score than a test consisting of 25 items. Wherever practicable,
give more items.
2. Group variability: The more heterogeneous the group, the higher the reliability. The
narrower the range of groups ability, the lower the reliability Differentiate among
students. Use items that differentiate the best students from the less able students.
3. Difficulty of items: Too difficult or too easy items produce little variation in the test
scores. This in turn lowers reliability. The difficulty of the assessment tasks should be
matched to the ability level of the students.
4. Scoring objectively: Subjectively scored items result in lower variability. More
objectivity scored assessment results are more reliable. For subjectively scored items,
multiple markers are preferred.
5. Speed: When a test is a speed test, reliability can be problematic. This is because not
every student is able to complete all of the items in a speed test. Tests where most
students do not complete the items due to inadequate allocation of time result in lower
reliability. Sufficient time should be provided to students to respond to the items. In
contrast, a power test is a test in which every student is able to complete all the items.
6. Sole marking: Using multiple graders improves the reliability of the assessment
results. A single person grading may lead to low reliability especially of essay tests,
term paper, and performances. Averaging the results of several makers increase
reliability.
7. Test-retest interval: The shorter the time interval between two administrations of a
test, the less likely the changes will occur and the higher the reliability will be.
8. Testing conditions: Errors in the testing situation (e.g., students misunderstanding or
misreading test directions, noise level, distractions, and sickness). Where test
administrators do not adhere strictly to uniform test regulations and practices, student,
scores may not represent their actual level of performances and this tends to reduce
reliability.
Validity
Nature of Validity
In using the term validity in relation to testing and assessment, certain points have to be
considered..
Principles of validation
There are four principles that help a teat user/ giver to decide the degree to which his/ her
assessment results are valid.
1. The interpretations (meanings) given to students’ assessment results are valid only to
the degree that evidence can be produced to support their appropriateness.
2. The uses made of assessment results are valid only to the degree that evidence can be
produced to support their appropriateness and correctness.
3. The interpretations and uses of assessment results are valid only when the educational
and social values implied by them are appropriate.
4. The interpretations and uses made of assessment results are valid only when the
consequences of these interpretation and uses are consistent with appropriate values.
1. Content-related evidence
This type of evidence refers to the content representativeness and relevance of tasks
or items of an instrument. Judgments of content representativeness focus on whether
the assessment tasks are a representative simple from a larger domain of performance.
Judgments of content relevance focus on whether the assessment task is included in
the test user’s domain definition when standardized tests are used.
Content-related evidence answers questions like
i. How will do the assessment task represent the domain of important content?
ii. How well do the assessment tasks represent the curriculum as defined?
iii. How well does the assessment task reflect current thinking about what should
be taught and assessed?
iv. Are the assessment tasks worthy of being learned?
To obtain answers for the questions, a description of the curriculum and content to be learned
(or learned) is obtained. Each assessment task is checked to see if its matches important
content and learning outcomes. Each assessment task is rated for its relevance, importance,
accuracy and meaningfulness.
One way ascertain content-related validity is to inspect the table of specification which is a
two-way chart showing the content coverage and the instructional objectives to be measured.
2. Criterion-related evidence
This type of evidence pertains to the empirical technique of studying the relationship between
the test scores or some other measures (predictors) and some impendent external measures
evidence answers the questions. How well the results of an assessment can be used to infer or
predict an individual’s standing on one or more outcomes other than the assessment
procedure itself. The outcome is called criterion.
There are two types of criterion-related evidence. These are Concurrent Validity and
predictive validity.
Concurrent validity evidence refers to the extent to which individual’s current status on a
criterion can be predicted from their prior performance on an assessment instrument. For
concurrent validity, data are collected at approximately the same time and the purpose is to
substitute the assessment result for the score of a related variable, e.g. ‘a test of swimming
ability vs swimming itself’ to be scored.
Expectancy table can also be used for validation. An expectancy table is a two-way table that
allows one to say how likely it is for a person with a specific assessment result to attain each
criterion score level.
F D C B A TOTAL
90-99 20 60 20 100
80-89 8 33 42 17 100
70-79 20 33 40 7 100
60-69 22 44 28 6 100
50-59 6 28 44 22 100
40-49 7 40 33 20 100
30-39 17 42 33 8 100
20-29 25 50 25 100
Determine the degree of success by using a grade e.g. C or better. A question will be: What is
the probability that a person with a score of 65 will succeed in this course (i.e. obtaining
grade C or better? The score of 65 lies in the 60-69 class and for this class 78% (44+28+6)
are successful, so the person has a 78% chance of success.
3. Construct-related evidence: This type of evidence refers to how the assessment result
can be interpreted as reflecting an individual’s status regarding an educational or
psychological trait, attribute or mental process. Example of constructs is mathematical
reasoning, reading, comprehension, creativity, honesty and sociability.
Content validity: does the assessment content cover what you want to assess?
Criterion-related validity: how well does the test measure what you want it to?
Construct validity: are you measuring what you think you're measuring?
It is fairly obvious that a valid assessment should have a good coverage of the criteria
(concepts, skills and knowledge) relevant to the purpose of the examination. The important
notion here is the purpose. For example:
There are two major types of classroom achievement tests. These are the essay-type tests and
objective-types tests.
OBJECTIVES-TYPE TESTS
Description
An objective test is the type of test that requires a respondent to provide a briefly response
which is usually not more than a sentence long. The tests normally consist of a large number
of items and the responses are scored objectively, to the extent that competent observers can
agree on how responses should be scored,
There are two major types of objective tests. These are the selection type and the supply type.
The selection type consists of the multiple-choice type, true and false type and matching
type. The supply type has variations such as completion, fill-in-the-blanks and short-answer
This is the most common objective-type test. A multiple-choice test is a type of objective
test in which the respondent is given a stem and then is to select from among three or
more alternatives (options or responses) the one that best completes the stem. The
incorrect options are called foils or distracters.
There are two types of multiple-choice tests. There are the single “correct” or “best response”
type and the “multiple response type”. They single “correct” or “response” types consist of a
stem following by three or more responses and the respondent is to select only one option to
complete the stem.
Examples:
In which of the following sites would you, as a community health worker, advise a
community to dispose refuse?
Multiple responses
The multiple response type consists of a stem followed by several true or false statements or
words. The respondent is to select, which statement(s) could complete the stem
An example is:
I) Arrest haemorrhage II) Bath the patient III) Immobilize injured bone
A. I only II only
B. I and II
C. I and III
D. I, II and III
1. Scoring is easy, accurate and efficient. That is they can be scored quickly and
accurately by machines, clerks, teaching assistant, and even students themselves.
This is so because the element of subjectivity in scoring is totally absent in the
multiple – choice test.
2. Highly objective measurement of student achievement.
3. They allow an extensive coverage of subject content that is multiple-choice test
affords excellent content sampling which generally leads to more content – valid
score interpretations.
4. They do not provide opportunities for bluffing.
5. They are best suited for measuring lower-level behaviours like knowledge and
comprehension
6. They provide economy of time in scoring.
7. Student writing is minimized. Premium is not placed on writing. Multiple-choice
items do not require students to write out and elaborate their answers.
8. They are amenable to item analysis. Item analysis is a procedure by which weakness
are detected within test items. This would be made possible by the use of a number of
plausible implications on their understanding and misunderstanding or whatever is
being measured. Thus, the distracter a student chooses may give the tester diagnostic
insight into difficulties the student is experiencing. Items of relatively high quality
will discriminate between better and poor students.
9. Scores are not affected by extraneous factors such as the likes and dislikes of the
scorer.
Disadvantages of MCQs
1. They are relatively difficult to construct. The construction of multiple-choice test items is
time and energy consuming. It is difficult to write good multiple – choice tests with
equally plausible alternatives.
2. Item writing is time consuming
3. They are susceptible to guessing. The error introduced by guessing is only reduced by
the use of multiple-choice items but not entirely overcome. The chance element is still
present.
4.
5. Higher-order mental processes like analysis, synthesis and evaluation are difficult to
measure.
6. Places premium on student’s reading ability.
7. The selection format of the multiple-choice items does not allow pupils to construct,
organize and present their own answers. Students must choose from a fixed list of
options rather than creating their own ideas or solutions.
1. The central issue of the item should be in the stem. It should be concise, easy to
read and unambiguously worded.
Poor
Ghana
Good
The largest man-made lake in Africa is in
A. Chad
B. Ghana
C. Kenya
D. Tanzania
E. Uganda
2. Avoid repetition of words in the option. The stem should be written so that key
words are incorporated in the stem and will not have to be repeated in each option.
Poor
Which is the best definition of a contour-line?
A. A line on a map joining places of equal barometric pressure.
B. A line on a map joining places of equal earthquake intensity.
C. A line on a map joining places of equal height.
D. A line on a map joining places of equal mean temperature.
E. A line on a map joining places of equal rainfall.
Good
A line on a map joining places of equal pressure is called an
A. isobar
B. isobront
C. isochasm
D. isogeothem
E. isotherm
3. Avoid using “all of the above” as an option but “none of the above can be used
sparingly if at all. “None of the above” as an option should be used only when an
item is o the correct answer type and not the best answer type.
4. Vary the placement of the correct options. The correct options should be placed
randomly throughout the test. This is important so that no discernible i. e. easily
learned pattern of the correct/best answers/responses should be noticed.
5. The correct alternative should be of the same overall length as the distracters. Do
not make the correct option consistently longer than the incorrect
answers/options by phrasing it in a more completely explained or more qualified
manner.
6. Create independent item: The answer to one item should not depend on the
knowledge of the answer to a previous item. Try to avoid linking and clueing.
Linking means that the answer to one or more item depends on obtaining the
correct answer to a previous item.
7. Specific determiners which are clues to the best/correct option should be avoided
Poor
The first woman cosmonaut is a
A. American
B. Englishman
C. Irish
D. Italian
E. Russian
Good
The first woman to go into space is a/an
A. American
E. Russian
C. French
D. Italian
B. British
In the poor example, the article, a gives a clue, that the correct option is Russian. In
addition, it is only Russians who use the term, cosmonaut. Also Englishman does not
belong to the group of options
Example:
A. Bed-bug
B. Black-fly
C. Body louse
D. Housefly
E. Tsetse fly
10. The expected response should not be put at the beginning of the stem
Poor example
.........................printing devices transmit output to a printer via radio waves.
A. Infrared
B. Laser
C. Bluetooth
D. Large Format
Good Example
What printing devices transmit output to a printer via radio waves?
A. Bluetooth
B. Cartridge
C. Infrared
D. Laser
Strengths or Advantages
1. They incorporate an extremely high guessing factor. They are subject to gross
error as a consequence of rampant guessing. Pupils’ scores on especially short
true – false test may be unduly influenced by good or poor luck in guessing.
2. Often results in ambiguous statements due to difficulty in writing statements that
are absolutely true or false.
3. Often includes more specific determiners or irrelevant clues.
1. For Simple, Compound and Multiple types, statements must be definitely true or
definitely false.
Poor: The value of 2/3 as decimal fractions is 0.7. True or False
Good: The value of i expressed as a decimal fraction correct to two decimal places is
0.66. True or False
4. Statements should be original. They must not be copied directly from textbooks, past
test items or any other written material.
5. Statements should be worded such that superficial logic suggests a wrong answer.
Poor – A patient took one tablet of a prescribed medicine and was healed in 24 hours.
8 tablets would therefore heal him in 3 hours. True or False
The true case is that 8 tablet would constitute an overdose
7. State each item positively. Negatively item could however be used with the negative
word, not, emphasized by underlining or writing in capital letters. Double negatives
should be avoided.
8. Statements should be short, simple and clear. Ambiguous as well as tricky statements
should be avoided.
Examples: (1) Abedi Pele is the best Ghanaian footballer. True or False
(2) Margaret Thacher was the British Prime Minister in \9W
True or False
Item 1 is ambiguous because best is relative while the trick in item 2 is the spelling to
Thatcher.
10. Arrange the items such that the correct responses do not form a discernible pattern
like TTTT FFFF TTTT FFFF
11. To avoid scoring problems, let students write the correct options in full.
Matching-Type Tests
Description
The matching type of objective test consists of two columns. The respondent is expected to
associate an item in Column A with a choice in Column B on the basis of a well-defined
relationship. Column A or 1 consists of the questions or problems to be answered. These are
known as a list of premises. Column B or 2 contains the possible answer, which are known
as the responses or options. A matching exercise therefore presents a student with three
things.
A. Directions for matching
B. a list of premises and
C. a list of responses
Example 1
Match the vitamins in Column A with the disease and conditions which, a lack of the vitamin
causes in column B.
Column A Column B
Vitamins Diseases caused by lack
1. Vitamin A a. Beriberi
2. Vitamin C b. Kwashiorkor
3. Vitamin D c. Pellagra
d. Poor eyesight
e. Rickets
f. Scurvy
Example 2: Directions
For each definition in column A below, select the most appropriate term from the set of terms
in column B:
Good example:
Instruction: Select a river from list B to complete the description in list A. Write the answer
against the number in list A.
A B
Description of river Name of river
1. Aswan Dam is built on it. a. Niger
2. The longest river in Africa. b. Nile
3. It is tributary of River Congo c. Orange
d. Ubangi
e. Volta
f. Zambezi
6. Provide complete directions. Instructions should clearly show what the rules are and also
how to respond to the items.
7. State clearly what each column represents.
8. Avoid clues (specific determiners) which indirectly reveal the correct option.
9. All options must be placed (and typed) on the same page.
Advantages
1. Scoring is easy and objective.
2. They allow an extensive coverage of subject content.
3. They do not provide opportunities for bluffing.
4. They are best suited for measuring lower-level behaviors like knowledge and
comprehension.
5. They provide economy of time is scoring.
6. Student writing is minimized. Premium is not placed on writing.
7. Item and statistical analysis can be done easily.
8. Scores are not affected by extraneous factors such as the likes and dislikes of the
scorer.
Disadvantages
1. Item writing is time consuming.
2. It provides opportunities for guessing.
3. Items are relatively difficult to construct.
4. Higher-order mental processes like analysis, synthesis and evaluation are difficult to
measure.
5. A premium is placed on student’s reading ability.
Constructed-Response Type
Short-Answer Type Tests
Description
This type of objective test is also known as the Supply, Completion, and fill-in-the blanks. It
consists of a statement or question and the respondent is required to complete it with a short
answer usually not more than one line. This could be a word, a phrase or a symbol. It is easily
recognized by the presence of one or more blanks in which the student writes his/her answer
to fill or complete the blank.
There are three common varieties of the short answer form. These are:
The question variety in which the item is presented as a direct question
The completion variety in which an incomplete statement is used and
The association or identification variety
Element Symbol
Oxygen
Gold
Potassium
Zinc
Advantages
1. Scoring is easy.
2. They allow an extensive coverage of subject content.
3. They do not provide opportunities of bluffing.
4. Minimizes guessing.
5. They are best suited for measuring lower-level behaviours especially knowledge
comprehension application.
Disadvantages
1. They are difficult to construct so that the desired response is clearly indicated.
2. Higher-order objectives and behaviours are difficult to measure.
3. Often includes more specific determiners.
4. Difficult and time consuming to score since more than one answer may have to
consider.
ESSAY-TYPE TESTS
Description:
An essay type test is a test that gives freedom to the respondent to compose his own response
using his own words. The test consists of relatively few items but each for each item
demands an extended response.
There are two types of essay-tests. These are the restricted response type and the extended
response type.
The restricted response type limits the respondent to a specified length and scope of the
response. For example, ‘In not more than 200 words explain the causes of the Yaa
Asantewaa War of 1900.
The extended response type does not limit the student in the form and scope of the answer.
For example, discuss the factors that led to the overthrow of the Dr. Kwame Nkrumah’s
government in Ghana in 1966.
Advantages
1. It is easier to prepare an essay test than to prepare objective test. Time spent in writing
the essay type test is comparatively less as compared to objective test items.
2. It is the only means of providing the respondent with the freedom to organize his / her
own ideas and response within unrestricted limits. Thus students or testees have a
greater degree of freedom to respond to the item.
3. There is elimination guessing on the part of the respondents since there is no options from
which students can select.
4. Skills such the ability to organize material and ability to write and arrive at conclusions
are improved.
5. Essay type tests encourage good study habits as students learn materials in their
wholes. This is so because sine the essay test is holistic in subject matter it covers whole
facts and processes rather than parts. This in turn discourages learning by rote and ensures
comprehensive learning.
6. It measures some complex learning outcomes which objective type tests fail to cover.
They are best for testing higher – order behaviours and mental process such as analysis,
synthesis and evaluation. In other words, essay test items which express divergent
thinking.
7. Little time is required to write the test items.
8. They are practical for testing a small number of students.
Disadvantages
1. They are difficult to score objectively. Thus a major weakness of an essay test is the
subjectively of scoring. There is a good deal of inconsistency between scores obtained
from successive administration of the same test. Intra – rater and inter – rater
variability could be very high. In fact Elliot cited Betsey (2001) reported that inter – rater
variability could be as high as 68%.
2. Essay test provides opportunity for bluffing where students write irrelevant and
unnecessary material. Students who do not have much to write may rely on the power of
vocabulary to attempt to convince the assessor.
3. Since students cannot be made to respond to so many essay items as a particular testing
time, only limited aspect of students’ knowledge are measured.
4. Essay tests suffer from limited sampling of subject matter content. Several content
areas or topics may be omitted.
5. Essay test is time-consuming to both the teacher who scores the responses and the student
who writes the responses.
6. They are susceptible to the halo effect, where the scoring in influenced by extraneous
factors such as the relationship between scorer and respondent’s handwriting etc.
7. A premium is placed on writing. Students who write faster, all things being equal are
expected to score higher marks.
8. The degree of correctness or merit of a student’s response can be effectively judged only
by a critical reader, a teacher skilled or informed about the subject.
(b) Analysis:
A Form 1 student girl was severely and unfairly punished. Describe the feelings such
treatment aroused in her.
(c) Synthesis:
You are the financial secretary of a society aimed at raising money to build a fish
pond in your community. Plan and describe a promotional campaign for raising the
money.
(d) Evaluation:
Evaluate the function of the United Nations Organization as a promoter of world
peace.
6. The length of the response and the difficulty level of items should be adapted to the
maturity level of students (age and educational level).
An item like:
“Discuss the implications of the Lome 11 Convention on the economy of Ghana”
would be too difficult for a first year Senior Secondary School student.
7. Optional items should not be provided when content is relevant. They may be
necessary only for large external examinations and when the purpose of the test is to
measure writing effectiveness. If students answer different questions, an analysis of
the performances on the test items is difficult.
8. All items should be of equal difficulty if students are to select from a given number of
items.
9. Prepare a scoring key (marking scheme) at the time is prepared.
Decide in advance what factors will be considered in evaluating an essay response.
Determine the points to be included and the weights to be assigned for each points.
The preparation of a model answer will help disclose ambiguities in an item.
10. Establish a framework and specify the limits of the problem so that the student knows
exactly what to do.
The following item for example does not any framework for the student to operate.
Write brief notes on the following:
a. United Nations Organization (UNO)
b. African Union (AU)
c. European Union (EU)
11. Present the student with a problem which is carefully worded so that only ONE
interpretation is possible. The questions/items must not be ambiguous or vague.
For example: Family Planning in Ghana is a “mixed bag”. Discuss
Deferent interpretations could be given to the term ‘mixed bag’ if it was not
mentioned in.
12. Indicate the value of the question and the time to be spent in answering it.
13. Structure the test item such that it will elicit the type of behaviour you really want to.
14. The test items must be based on the instructional objectives for each content unit.
An item like: Discuss the factors which in your opinion contributed to the escalation
of the Persian Gulf War in 1990.
This item elicits students opinions which might be different from the behaviour desired.
15. Give preference to a large number of items that require brief answers. These provide
a broader sampling of subject content and thus better than a few items that require
extended responses.
16. Statements and sub-questions for each item should be clearly related.
Poor:
A. The facilitator/tutor is central in the teaching/learning process in the
professional.
B. Evaluate the roles and responsibilities of tutors in distance education
programmes in Ghana.
C. Schizophrenia is an illness in which the functioning of the brain becomes
disorganized leading to disturbed feelings and emotions, confusion of thought
abnormal behaviour and withdrawal from the reality of the environment into
real world of fantasy’ Sainsbury 1987. Discuss.
17. Start essay test items with words that are clear and as simple as possible and which
requires the student to respond to the stimulus expected. Avoid words such as: what,
list who, as much as possible.
For example: What can you as a teacher do to promote professionalism in the teaching
service in Ghana? This item requires only a statement as the response and not an extended
answer.
Commonly used words to start essay questions.
1. Explain: to make plain or clear; to make known in detail. To tell what an
activity/process is and how it works and why it works the way it works.
2. Describe: to tell or depict (a picture) in written words.
3. Analyze: to determine elements or essential features; examine in detail to identify
causes, key factors, possible results.
4. Assess: to estimate or judge the value, character etc. of
5. Examine: to inspect or scrutinize carefully; to inquire into or investigate.
6. Discuss: to consider or examine by argument, or comment, give point for and
against the content of the question.
7. Evaluate: to judge or determine the significance worth or quality of. Involves
discussion and making a judgment.
8. Give an account of to describe a process/activity, and giving reasons, causes, effects
etc.
The analytic (also known as scrolling key, point score or trait score) method.
In analytic scoring the ideal or model answer is broken down into specific points. This
scoring method requires the tester to develop an outline or a list of major elements that
students are to include in the ideal answer. Then he/she decide on the number of points /
marks to award to students when they include each element. The student’s score is based
upon the number of quality points contained in his/her answer. The analytic scoring rubric is
best for restricted response essays. The scoring rubric is the same as a marking scheme.
Example 1.
Discuss five reasons why there is the need for counselling in our tertiary institutions. (30
marks)
Introduction (3 marks)
Definition/Explanation of counselling
Reasons (5 reasons × maximum 5 marks each. Total = 25 marks). Consider expression and
mechanics of writing in addition to discussion for each reason. Other reasons that are
relevant are expected.
1. Upsurge in institutional related problems.
2. Need to make informed choices.
3. Need to make informed decisions.
4. Inability of home to cope with all problems of students.
5. Promotion of attitudinal change.
6. Need for coping and adjustment skills.
7. Demands of modern world.
Conclusion (2marks)
Example 2.
Introduction (3marks)
Definition/Explanation of Counselling.
Reasons (5reason x maximum 4 marks each. Total = 20 marks) Consider and accept other
reason that are relevant.
1. Upsurge in institutional related problem.
2. Need to make informed choices.
3. Need to make informed decisions.
4. Inability of home to copy with all problems of students.
5. Promotion of attitudinal change.
6. Need for copying and adjustment skills.
7. Demands of modern world.
Conclusion (2marks)
Mechanics of writing and Expression (5makes)
Advantages
Generally, holistic scoring is simpler and faster than analytical scoring. It also
helps to review papers as a working whole.
It is effective when large numbers of essays are to be read.
Limitations
Its limitations include the inability to give specific feedback to students as to their
strengths and weaknesses.
Rater gives overall marks and do not point out details to their students that might
help them to improve. Scorers own bias and errors can be easily masked (go
unnoticed) by the overall mark.
DIFFERENCES
Essay Test Objective Test
Requires students to plan their own answers and Requires students to choose among
to express them in their own words. several designated alternatives or write a
short answer
Consists of relatively fewer questions but calls Consists of many items requiring only
for lengthy and extended responses brief answers (one or two words or a short
phrase).
The student spends most of his / her time The student spends a lot of his time
thinking and writing while taking the test reading and thinking when taking the test
Quality of the test is dependent largely on the Quality of the test is determined largely
Skill of the rater/test scorer by the skill of the test constructor.
Relatively easy to prepare but more difficult Relatively tedious and difficult to prepare
and tedious to grade accurately but rather easy to grade or score
Affords both the student and grader the Affords freedom of expression only to the
opportunity to be individualistic constructor (item writing)
Are more susceptible to bluffing Are more susceptible to guessing
The score distribution may vary from one Score distribution is determined largely
scorer to another by the test
Less amenable to item and statistical analysis Amenable to item and statistical analysis
Sampling is limited hence content validity is Sampling is usually extensive hence
low content validity is high
Reliability of test scores is low Reliability of test scores could be high
Can measure both knowledge and complex Can measure both but measurement of
achievement of complex achievement is knowledge and comprehension is more
recommended common
Emphasizes primarily on larger units of Emphasis is often on factual details
material
Achievement Tests
Achievement tests are tests that measure the extent of present knowledge and skills. In
achievement testing, test takers are given the opportunity to demonstrate their acquired
knowledge and skills in specific learning situations. The main objective of an achievement
test is to produce a fair and representative indication of what pupils have learned from the
instruction they were given. Since the results of the test will be to grade pupils and make
important decisions about them, it is necessary that the test provides valid and reliable
information about learning. If they do not, incorrect grades and poor decisions can result. A
clear understanding of this unit will therefore, equip teacher trainees with the skills to be able
to distinguish well-written test questions from poorly written ones.
There are two types of achievement tests. These are (1) standardized achievement tests
and (2) teacher-made/ classroom achievement test.
The major difference between these two types of tests is that standardized achievement
tests are carefully constructed by test experts with specific directions for administering
and scoring the tests. E.g. WASSE, BECE, UCC EXAMS. This makes it possible for
standardized achievement tests to be administered to individuals in different places often
at the same time.
These tests are constructed by classroom teachers for specific uses in each classroom and
are closely related to particular objectives. They are usually tailored to fit the teacher’s
instructional objectives. The content of the test is determined by the classroom teacher.
The quality of the test is often unknown but usually lower than standardized tests. E.g.
class tests, quizzes etc.
The main goal of classroom assessment is to obtain valid, reliable, and useful information
concerning students’ achievement. It is therefore important that good and quality tests and
assessment tasks are constructed. Four principal stages are involved in classroom testing.
These are:
The basic question to answer is, “Why am I testing?” the test must be related to teacher’s
classroom instructional objectives. Several purposes are served by classroom tests and
teacher has to be clear on the purpose of the test. The teacher has to answer other
questions such as ‘Why is the test being given at this time in the course?” “Who will take
the test?”, “Has the test takers been informed?”, “How will the scores be used?”
Test items could either be essay or objective types. Objective-type tests include multiple-
choice, short-answer, matching and true and false. The choice of format must be
appropriate for testing particular topics and objectives. It is sometimes necessary to use
more than one format in a single test. In other words, there could be both essay and
objective. Also, there could be a combination of the multiple choice and the true-false
type items in a particular testing situation. There are eight (80 factors to consider in the
choice of the appropriate format. These include;
(1) The purpose of the test, (2) the time available to prepare and score the test, (3) the
number of students to be tested, (4) skills to test, (5) difficulty desired, (6) physical
facilities like reproduction materials, (7) age of pupils, (8) skills in writing the
different types of items
The teacher asks himself/ herself the question. ‘What is it that I wish to measure’? The
teacher has to determine what chapters or units the test will cover as well as what
knowledge, skills and attitudes to measure. Instructional objectives must be defined in
terms of student behaviour and linked to what has been stressed in class.
A test plan made up of a table of specification or blue print must be made. The
specification table matches the course content with the instructional objectives. To
prepare the table, specific topics and sub-topics covered during the instructional period
are listed. The major course objectives are also specified and the instructional objectives
defined. The total numbers of test items are then distributed among the course content
and instructional objective or behaviour.
Conten Behaviour
t
Knowl Comprehe Applica Anal Synth Evalua To
edge nsion tion ysis esis tion tal
Assess 1 1 1 3
ment
Test 1 1 1 3
Evaluat 1 1 1 3
ion
Contin 2 1 1 4
uous
Assess
ment
Test 2 1 3
Reliabi
lity
Test 1 1 1 1 4
Validit
y
Total 4 7 5 3 1 20
Inspecting the individual items, the specific principles guiding the construction of each
type of test must be followed for the item format chosen. However, the following general
guidelines must be considered.
9. Keep the table of specification before you and continually refer to it as you write the
items.
10. Items must match the instructional objectives.
11. Formulate well-defined items that are not vague, and ambiguous and should be
grammatically corrected and free from spelling and typing errors.
12. Avoid excessive verbiage. Avoid needlessly complex sentences.
13. The item should be based on information that the examiner should know.
14. Write the test items simply and clearly.
15. Prepare more items than you will actually need.
16. The task to be performed and type of answers required should be clearly defined.
17. Include questions of varying difficulty.
18. Write the items and the key as soon as possible after the material has been taught.
19. Avoid textbook or stereotyped language.
20. Write the items in advance of the test date to permit reviews and editing.
This is another important step in test construction. Since the purpose of an achievement
test is to provide a fair and representative indication of how well pupils have learned the
things they were taught, faulty items are undesirable regardless of whether they inhibit or
enhance pupils’ test performance. If pupils get lower scores because test items are poorly
written or ambiguous, a fair and representative indication of learning will not be obtained.
If on the other hand, clues in items help pupils do better than they would have without the
clues, a fair and representative indication of pupils’ learning will not be obtained.
Critically examine each item at least a week after writing the items. Items that are
ambiguous and those poorly constructed as well as items that do not match the objectives
must be reworded or removed. Items must not be too difficult or too easy. Check the
length of the test (i.e. number of items against the purpose, the kinds of test items used
and the ability level of the students. The test must discriminate between the low achievers
and the high achievers. Assemble the test in the final form for administration
Many classroom teachers fail to score pupils’ work accurately. This is not a good practice
for obvious reasons. Prepare a scoring key or marking scheme while the items are fresh in
your mind. List correct responses and acceptable variations for objective-type tests.
Assign points to the various expected qualities of responses.
Assign values to each item and ensure representative sampling of content covered.
Step 7: Write directions
Give clear and concise directions for the entire test as well as sections of the test. Clearly
state the time limit for the test. Penalties for undesirable writing must be spelt out.
Directions must include number of items to respond to how the answers will be written,
where the answers will be written, amount of time available, credits for orderly
presentation of material (where necessary) and mode of identification of respondent. For
selection-type tests, indicate what will be done to guessing. E.g. answer all questions in
section A and only one (1) from section B
Before administration, the test should be evaluated by the following five criteria: clarity,
validity, practicality, efficiency and fairness.
Item analysis is the process of collecting, summarizing, and using information from students’
responses to make decisions about each test item. It is designed to answer the following;
1. Arrange the marked test papers from the highest score to the lowest score.
2. Create three groups-upper, middle and lower groups using the top 27% and the
bottom 27% if the total number of students is more than 40. Where the number of
students is between 20 and 40, select the top 10 students and the bottom 10 students.
For fewer than 20 students, create only two groups.
3. For each item summarize the number of students in each of the upper lower groups.
4. Calculate the difficulty index, i.e. the percentage of the total number of students who
got the item correct. The difficult index by convention is written as a decimal.
5. Compute the discrimination index, i.e. the difference between the percentages of
students in the upper and lower groups who got the item correct. The discrimination
index is often written as a decimal fraction.
6. Evaluate the effectiveness of options for multiple-choice tests (distracter analysis).
Every distracter should have at least one lower group student choosing it, and
more lower group students than the upper group students should choose it.
Every correct option should be selected by more students in the upper group.
Options are ambiguous if upper group students are unable to distinguish between
the correct response and one or more of the distracters.
If a large number of upper group students select a particular wrong response,
check to be sure the answer key is correct.
Points to note
A low index of discriminating power does not necessarily indicate a defective item. They
could be examined, however, for the possible presence of ambiguity, clues, and other
technical effects especially if they are selection-type items.
Negatively discriminating items should be avoided and not used in test construction.
Discrimination indexes of 0.50 and above are more desirable in test construction.
However, items with high, positive discrimination indices are used mostly by test
developers on standardized tests.
It is sometimes necessary to retain items with low discriminating power in order to
measure a representative sample of learning outcomes and course content.
Items with a 50% level of difficulty make maximum discrimination possible.
In norm-referenced testing, difficulty indices of between 0.16 and 0.84 are used to select
items where the test represents a single ability. If performance on the test represents
several different abilities, difficulty indices between 0.40 and 0.60 are used.
G. MEDIAN
The “median” is the “middle” value in the list of numbers. To find the median, your numbers
have to be listed in numerical order, so you may have to rewrite your list first.
Find the median for the following list of values:
13, 18, 13, 14, 13, 16, 14, 21, 13
The median is the middle value, so you will have to rewrite the list in order:
13, 13, 13, 13, 14, 14, 16, 18, 21
There are nine numbers in the list, so the middle one will be the 5th number:
13, 13, 13, 13, 14, 14, 16, 18, 21
So the median is 14
NOTE: when dealing with ungrouped data, the median is determined basically by inspection.
If the number of scores in a given distribution is odd, the median will be the single score at
the centre of the distribution located by inspection. If the number of scores in a distribution is
even, the median will be the average of the two scores at the centre of the distribution. Note
again that, the distribution has to be re – arranged before the single score or two scores at the
centre are located. Thus, for this distribution: 1, 2, 4, 6, 8 (N is 5, which is odd), the median is
4. But for this distribution: 1, 2, 4, 6, 8, 9 (N is 6, which is even) the median is 4+6
=5
H. MODE
The “mode” is the value that occurs most often. If no number is repeated, then there is no
mode for the list
Find the mode for the following list of values:
13, 18, 13, 14, 13, 16, 14, 21, 13
The mode is the number that is repeated more often than any other, so 13 is the mode.
NOTE: if there are scores that have the same highest frequency in a given distribution, those
two scores will constitute the mode and such distribution is said to be ‘bi – modal’. A
distribution that contains more than two modes is called ‘multi – modal’. It is determined
basically by inspection, and calls for o calculations in ungrouped data.
I. RANGE
The “range” is just the difference between the largest and smallest values.
find the range for the following list of values:
13, 18, 13, 14, 13, 16, 14, 21, 13
The largest value in the list 21, and the smallest is 13, so the range is 21 – 13 = 8.
Score Frequency
2 3
3 2
4 0
5 1
6 1
7 1
8 0
9 1
This is what a frequency distribution table is. You find the frequency of each score, then
write down each score in the left column and write their frequencies in the right column.
Frequency histograms:
a) this a pictorial representation of the data constructed by drawing based on some
established class interval.
b) Frequencies or performance are plotted against scores and a histogram constructed.
c) It indicates pictorially the most typical scores obtained by pupils which is a rough
measure of the ability of the class.
d) It also shows the direction of pupil’s performance.
e) Individual and group comparison can be made.
A Frequency Histogram
Frequency polygon/curve
a) This is an alternative way of representing frequency distributions in a graphical form.
b) Frequency are plotted against scores and the points connected by lines to form a smooth
curve.
A Frequency polygon/curve
UNIT 7: INTERPRETATION OF ASSESSMENT SCORES
Scores obtained in classroom quizzes, tests and examinations are known as raw scores.
They give very little information about the performance or achievement of a student. For
example if John obtained 48 in a test, it is difficult to know his level of performance unless
more information is provided. Such types of information include the maximum/ total score,
mean or median score, the variability of the group, the difficulty level of the items, the
number of test questions and the amount of time allowed for the test. To interpret and obtain
meaning from the scores, they need to be referenced or transformed into other scores.
There are two popular ways of interpreting test scores so the meaning can be derived from the
scores. There are
1. Norm-referenced Interpretation
2. Criterion-referenced Interpretation
Norm-referenced interpretation
1. Class raw score ranks: Raw scores in a class are often ordered in merit form from the
highest score (1st position) to the lowest score (last position). The ranks tell about how
a student performs compared with the others in the group.
2. Percentile and percentile rank:. A percentile is a point in a distribution below which a
certain percentage of the scores fall while a percentile rank is a person’s relative
position such that a given percentage of scores fall below the score obtained. If a raw
score of 48 is the 60 th percentile, it means that a student who obtains 48 in a test, has
done better than 60 percent of all those in the group that took the test.
3. Standard scores: These are either linear Z or T scores. A linear Z scores is based on
the normal distribution such that the mean is 0. Raw scores that are transformed to Z-
scores use the formula.
Z = E – E/S’ where E is the raw score, E is the group mean and S, the group standard
deviation.
Negative values show that performance is below average and positive values mean
that performance is above average.
T-scores are based on Z-score and use the formula; T = 50 + 10Z. Scores above 50
shows above average performance and scores below 50 shows below average
performance.
Stanines (Standard Nine): These are derived scores based on the normal distribution
with a mean of 5 and standard deviation of 2. It uses the integers, 1 – 9. The
percentage of scores at each Stanine are: 9 (top 4%), 8 (next 7%), 7 (next 12%),
6(next17%), 5(next 20%), 4(next 17%), 3 (next 12%), 2(next 7%) and 1 (next 4%).
Criterion-referenced interpretation
These describe test scores or performance in terms of the kinds of tasks a person with given
score can do. The performance can be compared to a pre-established standard or criterion.
For example a student may be able to solve 8 problems out of 10 concerning fractions. A
level of performance can be established at 6. The criterion or standard can be used as a
competency/ mastery score, so that students who have obtained scores that are greater than 6
are termed competent or have mastered skills in a particular domain. Criterion-referenced
interpretations generally indicate what an individual can or cannot do with respect to a
specified domain of knowledge attitudes or skills.
1. Percentage correct score: This is the percentage of items that a student got correct.
For example if a student obtained 8 marks out of 10, the percent correct is 80.
2. Competency score: These are cut-off scores set to match acceptable performance.
Students who obtained the cut-off scores are believed to have achieved a required
level 0.1 competency. Cut-off scores should not be arbitrarily set. There should be a
support or basis for them.
3. Quality rating: This is the quality level at which a student performs a task. For
example;
A student can be rated as A for outstanding, B+ for excellent etc.
4. Speed of performance scores: These indicate the amount of time a student uses to
complete a task or the number of tasks completed within a specified time. For
example, a student may type 30 words in a minute or an athlete may run 100 meters in
11.5 seconds.