Group 3 - Lecture Notes
Group 3 - Lecture Notes
This chapter intends to assist teachers in planning for the development of classroom-
based assessment to ensure its validity for measuring student achievement. This will provide
guidance on:
Specifying the purpose of the test from the very outset,
Identifying what essential learning outcomes to be measured and
Preparing a test blueprint that will guide the construction of items.
Multiple-choice items detects well and good at diagnosing the source of difficulty in
terms of misconceptions and areas of confusion. Each option represents a type of error that
students will likely to commit. Here‟s an example of a multiple-choice item with distractors
selected to represent possible errors of students.
Item: − = ____
a. (Error in getting difference in both numerator and denominator
b. (Correct option)
c. (Error in changing dissimilar fractions to similar ones)
d. (Error in getting difference between similar fractions)
In a simple one-way TOS, it has only one element, the objective/skill which is the one
being tested and the corresponding `number of items/points. It is called a one way grid showing a
In a two-way TOS, which is the one commonly used, two elements are
shown, i.e what (subject matter and outcome) and how (the type of test format).
For the test or item format, two general type can be utlized:
Objective test items - consists of questions with only one right answer or one best
answer. It requires to select the correct response or supply the missing fact.
Non-objective or performance tasks - requires the learner to construct/design
experiment or create responses in sentences, could be written form or in oral. Usually there is
a rubric or standards in grading these type of tests.
Objective Performance
Subject Area Outcome/Skills Alternate Gap Filling Product Assessment
Form
10 items
(25%)
10 items
(25%)
20 points (50%)
Rubrics
This is a type of an expanded TOS by indicating the specific item format to be used in
framing the test questions. This type of test blueprint could be flexible enough and well-
constructed in designing and planning a test as the specific type of tests are shown in the table.
This type of TOS is used in planning such long tests or final examination by showing
different units of study in the first column meant to develop similar cognitive outcomes. This
example is used for the subject Mathematics. Each unit may be intended to develop a common
set of skills like conceptual understanding, computational skills (application of concepts) and
problem solving in Mathematics.
Instructional Objective
Subject Type Use correct verb Use correct verb No. of Items
forms with singular forms with plural
subjects subjects
Miller, Linn & Grolund (2009) has shown a way of preparing a table of specifications
that breaks down a learning outcome covering a wider domain, This is done when the purpose of
testing is to determine in particular source of difficulty in mastering an outcome. This could be
very useful for learners with difficulties and confusion within one topic with varieties of
application.
Figure 8.1 illustrates a general tree chart for test types. If there are other types that do not appear
DECLARATIVE PROCEDURAL
• is able to state the law of supply and demand. • is able to compute the area of a rectangle.
COMPREHENSION COMPREHENSION
• is able to explain the law of supply and • is able to compare the size of two given lots in
demand. terms of area.
APPLICATION APPLICATION
• is able to explain the rising prices of vegetables • is able to determine the number of 1×1 tiles
during summer time. needed to cover a 50 ft × 100 ft hall.
Nitko (2001) give categories of these lower-order thinking skills and some examples of generic
questions for assessing them.
Table 8.3 Categories of Lower-order Thinking Skills and Sample Generic Questions
Low-Level Thinking Skills Examples of Generic Questions
Knowledge of Terminologies What is a __________?
Knowledge of Specific facts When did _________happen?
Knowledge of Conventions Where are ________ usually found?
Knowledge of trends and sequences Name the stages in _______?
Knowledge of classifications and categories Which _____ does not belong with the others?
Knowledge of criteria By what criterion will you use to judge ______?
Knowledge of methods, principles, techniques When________increases, what happens
to_____?
Comprehension What do you mean by the expression______?
Simple Interpretations What makes _________ interesting?
Solving numerical problems Use the data above to find the _________.
1. Lower order
• It is a simple comprehension question that is based on a specific reading material.
• The answer can be recalled from the material you read.
Example: Based on the article you just read, who was the 16th president of US?
2. Higher order
• It is a supply type that requires thinking at “creating” level.
• Extended essay that requires deeper understanding.
EXAMPLE:
Considering the influence of ocean temperatures, explain why inland temperatures vary in
summer and winter to a greater degree than coastal temperatures.
1. Lower Order
• The correct option is based on a specific reading material that requires simple comprehension.
EXAMPLE:
According to what you just read, how many siblings does Anna have?
EXAMPLE:
Which one of the following factors is a deterrent for the students to do well in higher
education?
Table 8.5. Illustrates the relationship between learning outcomes and test types. The arrows
suggests that supply or selection type can be used for both lower-level as well as higher-level
outcomes.
Knowledge Understanding
KNOWLEDGE Continuum Simple DEEP UNDERSTANDING
Understanding
Level 6: Creating
• Plan
• Generate
•Produce
• Design
• Construct
• Compose
Performance Tasks
Written
Work Sample
Simulation
Project
These are categories of thought questions used in constructing test types that can elicit complex
thinking skills.
COMPLETION TYPE - An incomplete statement with a blank is often used as stimulus and the
response is a constructed word, symbol numeral or phrase to complete the statement.
The blanks should be placed at the end or towards the end of the incomplete statement.
During the _____, there was a global economic downturn that devastated world financial
markets as well as the banking and real estate industries.
It can possibly call for diverse and ambiguous answers. Avoid providing unintended clues to the
correct answer.
Short Answer
● Rather than providing words to complete statements, relatively short answers are
provided as direct responses to questions.
● Both completion and short answer questions can be used to assess the same
learning outcomes and cognitive processes.
Improved: How much does the food caterer charge per head?
This question has a designated unit which is „per head‟.
THE TWO SUPPLY TYPES, COMPLETION AND SHORT ITEMS, SHARE IN COMMON
POINTS:
Essay Type
● Belongs to supply type for a simple reason that the required response will constructed by
the students.
● Completion and short-answer items are constructed for one answer only while essay-type
items are less structured and allow the students to organize their own answers.
● This format will test deep understanding and reasoning.
● Involve higher-order thinking skills.
1. Restricted-response essay
2. Extended-response essay
1. Are you in favor of same-sex marriage? Students clearly express their arguments in
Support your answer. support of the side they take.
2. What new evidence do you see of climate Students‟ evidence choices and approaches to
change, and what steps can humans take to addressing it vary widely.
minimize its negative consequences?
● Restrict the use of essay questions to those learning outcomes that cannot be measured
satisfactorily by objective items.
-This will challenge the student to indulge in higher-order thinking skills instead
of just memorizing pieces of information.
● Construct questions that will call forth the skills specified in learning standards.
-Create a question that will trigger the learning competencies of the students. A
question that will contribute to the improvement of the students‟ learning processes.
● Phrase the question so that the student‟s task is clearly defined.
-Both teacher and student should have the same interpretations of the questions.
● Indicate an approximate time limit for questions.
-The student will know how to budget their time so they do not get stuck with the
same question.
● Avoid the use of optional questions.
-Using of optional questions is not a good idea because we have a fix rubrics and
when it comes to scoring optional questions are least preferred since optional questions
means different subject. Different subjects may affect the rubrics as well as the scores of
the student.
For judging a specific writing genre like an argument, the rubric shown in table 8.12 can be
adapted for analytical scoring.
1. Write the item so that the answer options are consistent with the logic in the sentence.
Align your options with the logic of the proposition.
Poor:
Four and 6 are factors of 24. Yes No
Good:
Four and 6 are factors of 24. Correct Incorrect
Poor:
T F Right to suffrage is given to citizens in a democratic country in order to
enjoy economic gains.
3. Avoid long sentences. The unnecessary long and wordy statements will not clearly
expressed the significant idea.
Poor:
T F Criterion-referenced tests are interpreted based on a standard that
determines whether students have reached an acceptable level or not.
Better:
T F Standards are used to interpret criterion referenced tests.
4. Avoid insignificant or trivial facts or words. Due to unnecessary facts the students
commit errors.
Poor:
T F Legumes, beans and nuts should be avoided by people who are suffering
from gout whether inherited or not from their parents.
Better:
T F Legumes, beans, and nuts should be avoided by people with gout.
Poor:
T F All European nations are not in favor of joining the European Union.
Better:
T F All European nations are in favor of joining the European Union.
Poor:
T F Essay test are never easy to score.
7. Avoid using vague adjectives and adverbs. Students interpret differently such adjectives
and adverbs as typically, usually, occasionally, quite, etc. It often becomes a test of
vocabulary when done.
Poor:
T F People from cold countries typically drink wine every day.
Better:
T F People from cold countries are fond of drinking wine.
Multiple-Choice Item
Illustrative Items
A. Monarchy
B. Parliamentary
C. Presidential
D. Federal
Among the Asian countries, one which has a government with three branches is _______.
B. China
C. Philippines
D. Thailand
Writing a good multiple-choice items requires clarity in stating the problem in the stem and the
plausibility or attractiveness of distracters.
STEM
1. All the words of the stem should be relevant to the task. It means stating the problem
briefly yet clear, so that the student will understand what is expected to answer.
2. Stem-should be meaningful by itself and should fully contain the problem. This should
especially be observed when the stem uses an incomplete statement format. Consider this
stem:
This stem can be improved by changing its format to a direct question or adding more
information in the complete statement like:
3. The stem should use a question with only one correct or clearly best answer. Ambiguity
sets in when the stem allows for more than one best answer. The student will likely based
their answer on personal perspective instead of facts.
1. All distracters should appear plausible to uninformed test-takers, making the item
discriminating and therefore valid. This is the key to making the item discriminating and
therefore valid. The validity of the item suffers when there is a distracter that is obviously
correct as option D or obviously wrong as option B in the following item.
Example:
[Poor]
What is matter?
Table 8.15 Ways to Make Distracters Plausible (given by Miller, Linn, & Gronlund, 2009)
1. Use the students' most common 5. Use incorrect answers that are likely
errors. to result from student
2. Use important-sounding words (e.g. misunderstanding or carelessness
significant, accurate) that are relevant (e.g. forgets to convert feet to yards)
to the item stem. But do not overdo it! 6. Use distracters that are homogenous
Ex. and similar in content to the correct
What is a consequence of climate option (e.g. all are inventors)
change? Ex.
A. Economic reparation and Who invented the lightning rod?
improvement A. Thomas Edison
B. Transient loss of biodiversity B. Nikola Tesla
C. Diminishing water and hunger C. Benjamin Franklin*
crises D. Michael Faraday
D. Irreparable damage to 7. Use distracters that parallel in form
Caution: Distracters should distract the uninformed, but they should not result in trick
questions that mislead the knowledgeable students (do not insert “not” in a correct
answer to make a distracter)
MATCHING ITEMS
Of the three general selected-response item formats, matching items appear different. It consists
of two parallel lists of words or phrases the students are tasked to pair. The first list which is to
Illustrative Item 1
The first column describes events associated with Philippine presidents while the second column
gives their names. In the space provided, write the letter of the president that matches the
description.
Column A Column B
_______1. First president of the Republic a. Ramon Magsaysay
_______2. Declared martial law during his b. Corazon Aquino
term c. Gloria Macapagal-Arroyo
_______3. First president to resign from office d. Manuel L. Quezon
_______4. First woman president e. Fidel C. Ramos
_______5. Died in an airplane crash f. Emilio Aguinaldo
_______6. A uniformed man elected into g. Joseph Ejercito Estrada
office h. Manuel A. Roxas
i. Ferdinand Marcos
Illustrative Item 2 (for advanced level)
Column A contains theoretical postulations of how the universe came about. Match each one
with the name of the theory given in Column B. Indicate the appropriate letter to the left of the
number in Column A.
The two illustrative items exemplify the guidelines in constructing matching items (Kubiszyn
and Norich, 2010).
1. Keep the list of premises and the list of options homogenous or belonging to a
category.
In Sample 1, the premises are events associated with Philippine presidents while the
options are names of presidents. In Sample 2, Column A lists some theories in astronomy
about how the universe has evolved and Column B lists the names of the theories.
Homogeneity is a basic principle in matching items.
2. Keep the premises always in the first column and the options in the second column.
Since the premises are oftentimes descriptions of events, illustrations of principles,
functions or characteristics, they appear longer than the options which are most of the
times are names, categories, objects, and parts. The ordering of the two columns this way
saves reading time for the students since they will usually read one long premise once
and select the appropriate match from a list of short words. If ordered the opposite way,
the students will read a short word as the premise then read through long descriptions to
look for the correct answer. Especially for Sample 2, the students will normally read a
It can be seen that matching type as a test format is used quite appropriately in assessing
knowledge outcomes particularly for recall of terminologies, classifications, and remembering
facts, concepts, principles, formulae, and associations. Its main advantage is its efficiency in
being able to test several concepts using the same format.
- Judgmental Approach
- Empirical Approach
- It is always advisable for teachers to take a second look at the assessment tools
that they have devised for a specific purpose.
- A type of review through teacher
Five ways and guidelines suggested by Popham (2011) for the teachers to follow in
exercising judgment:
A. Do the items follow the specific and general guidelines in writing items especially on:
Being aligned to instructional objectives?
Making the problem clear and unambiguous?
Providing plausible options?
2. Did any items have more than one correct answer? If so, which ones?
4. Were there words in any items that confused you? If so, which ones?
5. Were the directions for the test, or for particular subsections, unclear? If so, which ones?
Item – improvement using empirically – based methods is aimed at improving the quality
of an item using students‟ responses to the test.
Test developers refer to this technical process as item analysis as it utilizes data obtained
separately for each item.
- Two indices are related since the level of difficulty of an item contributes to its
discriminability.
- An item with a high difficulty index will not be considered as an “easy item” and
therefore a weak item, but rather an item that displays the capability of the
learners to perform the expected outcome.
Difficulty Index
- An item‟s difficulty index is obtained by calculating the p value (p) which is the
proportion of students answering the items correctly.
Discrimination Index
DIFFICULTY INDEX
P = R/T
P = Difficulty Index
Example #1:
There were 45 students in the class who responded to item 1, and 30 answered it correctly.
Computation: p = 30/45
= 0.67
Sixty – seven percent (67%) got the item right while 33% missed it.
Example #2:
Computation: p = 10/45
= 0.22
Out of 45, only 10 or 22% got the item right while 35 or 78% missed it.
Between the two items, item 2 appears to be a much more difficult item since less than a fourth
of the class only was able to respond correctly.
The class shows much better performance in item 1 than in item 2. It is still a long way for many
to master item 2.
Discrimination Index
Shows the difference that exists between the performances of those who scored high and
low in an item.
1. Positively Discriminating Item –the proportion of high scoring group is greater than the
lower scoring group
2. Negatively Discriminating item – the proportion of high scoring group is less than of the
low scoring group
3. Not Discriminating- the proportion of the high scoring group is equal to that of the low
scoring group
Calculation
D= Rᵤ/Tᵤ - Rı/ Tı
D= Rᵤ - Rı
T
Where Rᵤ = number of upper group getting the item correct
D= Pᵤ - Pı
Here are the steps to follow to obtain the proportions of the upper and lower groups
responding the item correctly.
a. Score the test papers using a key to correction to obtain the total scores of the students.
Maximum score is the total number of objective items.
b. Order the test papers from highest to lowest score.
c. Split the test papers into halves: high group and low group
d. Obtain the p-value for the Upper group and p-value for the lower group
e. Get the discrimination index by getting the difference between the p-values.
Guidelines to select the satisfactory items and what to do to improve the rest (Ebel &
Frisbie 1991)
Take Note: Items with negative discrimination indices, although significantly high, are subject
right away to revision if not deletion. With multiple-choice items, negative D is a forensic
evidence of errors in item writing. It suggests the possibility of:
Wrong key – students selected a distracter which is the correct answer but not in the
answer key.
Unclear option
Ambiguous distractions
Implausible keyed option which more informed students will not choose.
Example Table:
Total Questions
Student
Score (%) 1 2 3
Asif 90 ✔ 0 ✔
Sam 90 ✔ 0 ✔
Jill 80 0 0 ✔
Charlie 80 ✔ 0 ✔
Ruben 60 ✔ 0 0
Clay 60 ✔ 0 ✔
Kelley 50 ✔ ✔ 0
Justin 50 ✔ ✔ 0
Tonya 40 0 ✔ 0
What to do: Subtract the number of lower group who got the item correct from the number of
students in the upper group who got the item correct. Then, divide by the number of students in
each group
Number of students in the lower group that got the correct item in question 1 = 4
Number of students in the higher group that got the correct item in question 1 = 4
Solution
4-4 / 5
0/5
=0
Discrimination Index = 0
Distracter Analysis
It can detect the differences in how the more able students respond to the distracters in a
multiple-choice item compared to how the less able ones do it.
It can also provide an index on the plausibility of the alternatives, that is if they are
functioning as good distracters.
Distracters not chosen at all especially by the uninformed students need to be revised to
increase attractiveness.
Example
Another empirical approach for reviewing test items to infer how sensitive an item has
been to instruction. It signifies a change in student‟s performance as a result of
instruction.
Calculation:
Example:
Consider an item where in a class of 40, 80% answered it correctly in the post-test while only
10% did it the pre-test
Its p-value for the post test is .80 while for pre-test is .10, thus Si = .70
= .80 - .10
= .70