0% found this document useful (0 votes)
52 views47 pages

Group 3 - Lecture Notes

The document discusses guidelines for developing classroom-based assessments. It begins by defining classroom-based assessment as teacher-led tests that measure student learning and progress based on instruction. It emphasizes specifying learning outcomes, developing a test blueprint, and using various item formats. The test development process involves planning, item construction, and review phases. It provides examples of tables of specifications that identify what content and skills will be assessed and with what item types to help teachers effectively plan assessments.

Uploaded by

Kamile Ilagan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views47 pages

Group 3 - Lecture Notes

The document discusses guidelines for developing classroom-based assessments. It begins by defining classroom-based assessment as teacher-led tests that measure student learning and progress based on instruction. It emphasizes specifying learning outcomes, developing a test blueprint, and using various item formats. The test development process involves planning, item construction, and review phases. It provides examples of tables of specifications that identify what content and skills will be assessed and with what item types to help teachers effectively plan assessments.

Uploaded by

Kamile Ilagan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


CHAPTER 7: PLANNING THE TEST

DEVELOPMENT OF CLASSROOM-BASED ASSESSMENT


 Associated largely with teacher-led testing – This means that in classroom-based
assessment the teacher or the professor is the one who facilitates the assessment.
 Focuses on the measurement of what the learners achieved as a result of instruction –
Classroom-based assessments focuses on the student‟s progress in learning and measures it
through tests.
 The development of this type of test has been based on subject-centered measurement
(Crocker and Algina, 1986) – Classroom-based assessments focuses on a certain subject to
measure the student‟s learning progress on that specific subject.

This chapter intends to assist teachers in planning for the development of classroom-
based assessment to ensure its validity for measuring student achievement. This will provide
guidance on:
 Specifying the purpose of the test from the very outset,
 Identifying what essential learning outcomes to be measured and
 Preparing a test blueprint that will guide the construction of items.

OVERALL TEST DEVELOPMENT PROCESS


The process of test construction for classroom testing applies the same initial steps in the
construction of any instrument designed to measure a psychological construct.
A. Planning Phase – where the purpose of the test are identified, learning outcomes are
clearly specified and table of specification is prepared to guide the item construction phase.
B. Item Construction Phase – where test items are constructed following the appropriate
item format for the specified learning outcomes of instruction.
C. Review Phase – where the items are examined and reviewed by the teacher and/or his/her
peers prior to administration based on judgment of their alignment to content and behavior
components of the instructional competencies, and other administration, based on analysis of
students‟ performance in each item.

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


IDENTIFYING PURPOSE OF TEST
 Seeks to uncover what students know and can do to get feedback on what needs to
improve in their learning.
 Teachers can use the results to map out their strategies to improve teaching.
 The results are used to address specific learning problems while instruction is still in
progress.

Multiple-choice items detects well and good at diagnosing the source of difficulty in
terms of misconceptions and areas of confusion. Each option represents a type of error that
students will likely to commit. Here‟s an example of a multiple-choice item with distractors
selected to represent possible errors of students.
Item: − = ____
a. (Error in getting difference in both numerator and denominator
b. (Correct option)
c. (Error in changing dissimilar fractions to similar ones)
d. (Error in getting difference between similar fractions)

SPECIFYING THE LEARNING OUTCOMES


 The learning outcomes should communicate both specific content and nature of tasks to
be performed. – the learning outcomes must be aligned with the lesson and the tasks to be
done in application of the lesson.
 The assessment should become a quality assurance tool for tracking student progress in
attaining the curriculum standards. – the learning outcomes must be achieved through
assessments and must become the tools for assuring that students are progressing with quality
based on the curriculum standards.
 The process of assessment must recognize and address different learning targets defined
by the intended outcomes from knowledge of facts and information covered by the
curriculum. – the learning outcomes and different learning targets must be achieved through
the process of assessments and instructions inclined with the curriculum standards.

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


PREPARING A TEST BLUEPRINT
 The test blueprint must determine appropriately the learning outcomes to be assessed and
how they will be assessed.
 The test blueprint must help teacher in the planning phase to make genuine connections
in the trilogy of curriculum, instruction, and assessment.

TEST BLUEPRINT / TABLE OF SPECIFICATIONS (TOS)


A Test blueprint or sometimes called the Table of specifications is prepared and set-up
to assure the composition of a good test. It obtains information based on;
 WHAT will be tested and, (content)
 HOW it will be tested (test format)

A table of specification could come in different forms/format depending on what the


teacher is targeting and showing. Mcmillan (2007) suggests that some rules of thumb in
determining how many items are sufficient for good sampling, A minimum of ten items is
needed to assess each knowledge learning target in a unit but which should represent a good
cross-section of difficulty of items.
 The more important a learning outcome is, the more likely will there be more points
allotted to it.
 Eighty percent (80%) correct of items for a competency is an acceptable mastery-
criterion.

Objectives/Skills Number of Items/Points

In a simple one-way TOS, it has only one element, the objective/skill which is the one
being tested and the corresponding `number of items/points. It is called a one way grid showing a

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


different learning skills within the same topic area. This is often used for skill-oriented subjects
like language and reading.

Content Outcome/Skill Test Format # of Items/Points

In a two-way TOS, which is the one commonly used, two elements are
shown, i.e what (subject matter and outcome) and how (the type of test format).
For the test or item format, two general type can be utlized:
 Objective test items - consists of questions with only one right answer or one best
answer. It requires to select the correct response or supply the missing fact.
 Non-objective or performance tasks - requires the learner to construct/design
experiment or create responses in sentences, could be written form or in oral. Usually there is
a rubric or standards in grading these type of tests.

Objective Performance
Subject Area Outcome/Skills Alternate Gap Filling Product Assessment
Form
10 items
(25%)
10 items
(25%)
20 points (50%)
Rubrics

This is a type of an expanded TOS by indicating the specific item format to be used in
framing the test questions. This type of test blueprint could be flexible enough and well-
constructed in designing and planning a test as the specific type of tests are shown in the table.

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


Competencies
Comprehending Applying Analysis
Unit (Concepts) (Computation) (Problem Solving) No. of Items

This type of TOS is used in planning such long tests or final examination by showing
different units of study in the first column meant to develop similar cognitive outcomes. This
example is used for the subject Mathematics. Each unit may be intended to develop a common
set of skills like conceptual understanding, computational skills (application of concepts) and
problem solving in Mathematics.

Instructional Objective
Subject Type Use correct verb Use correct verb No. of Items
forms with singular forms with plural
subjects subjects

Miller, Linn & Grolund (2009) has shown a way of preparing a table of specifications
that breaks down a learning outcome covering a wider domain, This is done when the purpose of
testing is to determine in particular source of difficulty in mastering an outcome. This could be
very useful for learners with difficulties and confusion within one topic with varieties of
application.

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


CHAPTER 8: SELECTING AND
CONSTRUCTING TEST ITEMS AND
TASKS

MENU OF TEST TYPES

• Matching •Extended Essay •Experimentation • True or False


•Performance Tasks •Completion •Work Sample •Multiple Choice •Yes/No
•Binary Choice •Supply Type •Short Answer •Restricted Essay
• Identification • Enumeration •Right/Wrong •Paper-pencil task •Drawing Test
•Orao Questioning •Selection Type •Simulation •Project

Figure 8.1 illustrates a general tree chart for test types. If there are other types that do not appear

here, there may be variants of one sub-type.

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


RELATING TEST TYPES WITH LEVELS OF LEARNING OUTCOMES
•A review of curricular frameworks of educational systems across various countries shows
common integral domains that govern their content and performance standards in
different subjects areas.
•Basic in all are Knowledge, Skills and Processes, Understanding Product, Affect (i.e,
attitudes and values).
A. MEASURING KNOWLEDGE AND SIMPLE UNDERSTANDING
•Knowledge, as it appears in cognitive taxonomies (Bloom, 1956; Anderson& Krathwol, 2004)
as the simplest and lowest level, is categorized further into what thinking process is involved in
learning.
•The revision of Bloom's taxonomy (Anderson & Krathwol,2004) recognizes how remembering
can be viewed not only as being able to recall but also as being necessary in learning
interrelationships among basic elements and in learning methods, strategies and procedures.
•McMillan(2007) refers to the latter two as simple understanding requiring "comprehension of
concepts, ideas and generalizations" known as declarative knowledge and application of skills
and procedures learned in new situations, referred to as procedural knowledge.

Table 8.2 Levels of Declarative and Procedural Knowledge


LEVEL DECLARATIVE PROCEDURAL
Remember, restates, defines, Remember, restates, defines, identifies,
identifies, recognizes, names, recognizes, names, reproduces, or selects
Knowledge reproduces, or select specific facts, correct procedure, steps, skills, or strategies.
concepts, principles, rules and theories.
Converts, translates, distinguishes, Converts, translates, distinguishes, explains,
explains, provides examples, summarizes, interprets,
Simple provides examples, summarizes, infers or predicts in own words, correct
Understanding: interprets, infers or predicts in own procedure, steps, skills and strategies.
Comprehension words, essential meanings of concepts
and principles.
Uses existing knowledge of Uses existing knowledge of correct
Simple concepts, principles, and theories, procedures, steps, skills, or strategies, in

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


Understanding: in new situations, to solve problems, new situations, to solve problems,
Application interpret information and construct interpret information and construct responses.
responses.

DECLARATIVE PROCEDURAL
• is able to state the law of supply and demand. • is able to compute the area of a rectangle.
COMPREHENSION COMPREHENSION
• is able to explain the law of supply and • is able to compare the size of two given lots in
demand. terms of area.
APPLICATION APPLICATION
• is able to explain the rising prices of vegetables • is able to determine the number of 1×1 tiles
during summer time. needed to cover a 50 ft × 100 ft hall.

Nitko (2001) give categories of these lower-order thinking skills and some examples of generic
questions for assessing them.

Table 8.3 Categories of Lower-order Thinking Skills and Sample Generic Questions
Low-Level Thinking Skills Examples of Generic Questions
Knowledge of Terminologies What is a __________?
Knowledge of Specific facts When did _________happen?
Knowledge of Conventions Where are ________ usually found?
Knowledge of trends and sequences Name the stages in _______?
Knowledge of classifications and categories Which _____ does not belong with the others?
Knowledge of criteria By what criterion will you use to judge ______?
Knowledge of methods, principles, techniques When________increases, what happens
to_____?
Comprehension What do you mean by the expression______?
Simple Interpretations What makes _________ interesting?
Solving numerical problems Use the data above to find the _________.

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


Manipulating Symbols Equations Show that_________equals _______.

CONSTRUCTED – RESPONSE TYPE

Short item and Extended essay

It has two types:

1. Lower order
• It is a simple comprehension question that is based on a specific reading material.
• The answer can be recalled from the material you read.
Example: Based on the article you just read, who was the 16th president of US?

2. Higher order
• It is a supply type that requires thinking at “creating” level.
• Extended essay that requires deeper understanding.

EXAMPLE:
Considering the influence of ocean temperatures, explain why inland temperatures vary in
summer and winter to a greater degree than coastal temperatures.

SELECTED – RESPONSE TYPE

1. Lower Order
• The correct option is based on a specific reading material that requires simple comprehension.

EXAMPLE:
According to what you just read, how many siblings does Anna have?

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


A. 4
B. 5
C. 9

2.  Higher Order


• The correct option requires analysis of alternatives given.

EXAMPLE:
Which one of the following factors is a deterrent for the students to do well in higher
education?

a.  Political orientation


b.  Academic strategies
c.  Comprehension ability
d.  Academic discourse

B. MEASURING DEEP UNDERSTANDING


• McMillan (2007) utilizes a Knowledge/Understanding comtinuum to illustrate the relative
degree of understanding from knowledge to simple understanding to deep understand.

Table 8.4 Alignment of Learning Outcomes and Cognitive Levels

Knowledge Knowledge DEEP


Understanding Continuum Simple Understanding UNDERSTANDING

Cognitive Levels/Levels of Learning Outcomes


Level 1: Remembering Level 2: Comprehending Level 4: Analyzing
• Interpret • Organize
•Recall • Exemplify • Transform

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


• Recognize • Classify • Distinguish
•Name • Compare • Diagnose
• Describe • Explain • Outline
• Infer •Deconstruct
Level 3: Applying Level 5: Evaluating
• Solve •Critic.
• Apply • Assess
• Modify • Defend • Justify
•Demonstrate • Appraise
• Employ • Recommend
•Calculate
• Generate Level 6: Creating
• Plan
• Generate
•Produce
• Design
• Construct
• Compose

Table 8.5. Illustrates the relationship between learning outcomes and test types. The arrows
suggests that supply or selection type can be used for both lower-level as well as higher-level
outcomes.

Table 8.5 Alignment of Learing Outcomes to Test Types

Knowledge Understanding
KNOWLEDGE Continuum Simple DEEP UNDERSTANDING
Understanding

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


Cognitive Levels/Levels of Learning Outcomes
Level 1: Remembering Level 2: Comprehending Level 4: Analyzing
• Recall • Interpret • Organize
• Recognize • Exemplify • Transform
• Name • Classify • Distinguish
• Describe • Compare • Diagnose
• Explain • Outline
• Infer •Deconstruct
Level 3: Applying
• Solve Level 5: Evaluating
• Apply •Critic.
• Modify • Assess
• Demonstrate • Defend
• Employ • Justify
• Calculate • Appraise
• Generate • Recommend

Level 6: Creating
• Plan
• Generate
•Produce
• Design
• Construct
• Compose

Supply Type Supply Type Supply Type

Completion Completion Essay-Restrictive


Short-Answer Short-Answer Essay-Extended

Selection Type Selection Type Selection Type

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


Binary Choice Binary Choice Multiple-Choice
Multiple-choice Multiple-choice Interpretive items
Matching Type

Performance Tasks

Written
Work Sample
Simulation
Project

Miller, Linn & Gronlund (2009)

These are categories of thought questions used in constructing test types that can elicit complex
thinking skills.

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


Angelo & Cross (1993)

It is a designed classroom assessment tasks (CATs) which is performance-based type in nature.

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


CONSTRUCTING OBJECTIVE SUPPLY TYPE OF ITEMS

COMPLETION TYPE - An incomplete statement with a blank is often used as stimulus and the
response is a constructed word, symbol numeral or phrase to complete the statement.

The blanks should be placed at the end or towards the end of the incomplete statement.

The outermost shell of a terrestrial planet is called _______.

If the blank is placed at the beginning like:

During the _____, there was a global economic downturn that devastated world financial
markets as well as the banking and real estate industries.
It can possibly call for diverse and ambiguous answers. Avoid providing unintended clues to the
correct answer.

Short Answer

● Rather than providing words to complete statements, relatively short answers are
provided as direct responses to questions.
● Both completion and short answer questions can be used to assess the same
learning outcomes and cognitive processes.

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


(See Table 8.9 for example)

Writing short-answer items by McMillan:


1. State the item so that only one answer is correct.
2. State the item so that the required answer is brief. Because it‟s unnecessary to require a
long response and it can limit the number of items that the students can answer within the
allotted time.
3. Do not use questions verbatim from textbooks, because it will turn out as a memory test
instead of a comprehension test.
4. Designate units required for the answer. Without designating unit or specific topic, the
answer may be wrong because of a different mindset.
Example:
Poor: How much does the food caterer charge?
(We don‟t have a specific topic or subject here like cost: per head, per dish, per
plate, or as a full package. Different mind-set may answer with different perspectives.)

Improved: How much does the food caterer charge per head?
This question has a designated unit which is „per head‟.

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


5. State the items precisely and with words that students understand.
Example:
Poor: As viewed by creatures from the earth, when does the blood moon appear in the
evening?
(Unnecessary phrase used.)
Improved: When does the blood moon appear in the evening?
(Precise and straight to the point.)

THE TWO SUPPLY TYPES, COMPLETION AND SHORT ITEMS, SHARE IN COMMON
POINTS:

● Appropriate for assessing learning outcomes involving knowledge and simple


understanding.
● Capable of assessing both declarative and procedural knowledge.
● Both are easy and simple to construct.
● Both are objectively scored since a key to correction can be prepared in advance.
● Both need an ample number of items to assess a learning outcome.

NON-OBJECTIVE SUPPLY TYPE

Essay Type

● Belongs to supply type for a simple reason that the required response will constructed by
the students.
● Completion and short-answer items are constructed for one answer only while essay-type
items are less structured and allow the students to organize their own answers.
● This format will test deep understanding and reasoning.
● Involve higher-order thinking skills.

TWO VARIATIONS OF ESSAY ITEMS

1. Restricted-response essay
2. Extended-response essay

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


A. RESTRICTED-RESPONSE ESSAY- Limited coverage content.

1. (Restricted content) Tourism spot to be described should be one


found in Luzon.
What is a famous tourism spot in the island of
Luzon and why is it popular?

2. (Restricted length) Length of discourse should not exceed half sheet


of paper.
On a half sheet of paper, describe the benefits of
Sagip Kapamilya as an organization.

3. (Restricted form) Response should be organized in a two-tier


outline form.
Prepare a 2-tier outline of an advocacy plan for
community involvement in waste reduction and
disposal.

4. (Restricted perspective) Response will only be acceptable when the


explanation adheres to the theory of symbolic
Describe the origin of religion according to the
interactionism.
symbolic interactionism theory.

B. EXTENDED-RESPONSE ESSAY- Free to organize and expound the ideas.

1. Are you in favor of same-sex marriage? Students clearly express their arguments in
Support your answer. support of the side they take.

2. What new evidence do you see of climate Students‟ evidence choices and approaches to
change, and what steps can humans take to addressing it vary widely.
minimize its negative consequences?

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


3. Describe the impact current socio-economic The student is free to focus on any socio-
issues have on the lives of Filipinos. economic issue and choose which aspect of the
people‟s lives he wants to describe.

CONSTRUCTING ESSAY QUESTIONS by Miller, Linn & Gronlund

● Restrict the use of essay questions to those learning outcomes that cannot be measured
satisfactorily by objective items.
-This will challenge the student to indulge in higher-order thinking skills instead
of just memorizing pieces of information.
● Construct questions that will call forth the skills specified in learning standards.
-Create a question that will trigger the learning competencies of the students. A
question that will contribute to the improvement of the students‟ learning processes.
● Phrase the question so that the student‟s task is clearly defined.
-Both teacher and student should have the same interpretations of the questions.
● Indicate an approximate time limit for questions.
-The student will know how to budget their time so they do not get stuck with the
same question.
● Avoid the use of optional questions.
-Using of optional questions is not a good idea because we have a fix rubrics and
when it comes to scoring optional questions are least preferred since optional questions
means different subject. Different subjects may affect the rubrics as well as the scores of
the student.

Analytic scoring is a method of evaluating student work that requires assigning a


separate score for each dimension of a task. Analytic scoring is most often used when there is a
need to assess how well students perform on individual dimensions of whole product or
performance.

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


Table 8.11 illustrates the Analytical Scoring Structure

For judging a specific writing genre like an argument, the rubric shown in table 8.12 can be
adapted for analytical scoring.

Table 8.12 Rubric for Analytical Scoring

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


When the attributes are considered together to arrive at an overall judgment or impression,
Holistic Scoring is in use.

SUGGESTION GIVEN BY MILLER, LINN & GRONLUND (2009,P,254)


1. Prepare an outline of the expected answer in advance.
2. Use the scoring rubric that is most appreciate.
3. Decide how to handle factors that are irrelevant to the learning outcomes being measured.
4. Evaluate all responses to one question before going on the next one.
5. When possible, evaluate the answers without looking at the student’s name.
6. If especially important decisions are to be based on the results, obtain two or more
independent ratings.

Constructing Selected-response types


Choosing the nearly best or most correct option to answer a problem. There are three sub-types
 Alternate form or binary choice

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


 Multiple-choice
 Matching type

A. Binary or Alternate Form

WAYS TO MAKE A GOOD BINARY CHOICE ITEMS.

1. Write the item so that the answer options are consistent with the logic in the sentence.
Align your options with the logic of the proposition.

Poor:
Four and 6 are factors of 24. Yes No
Good:
Four and 6 are factors of 24. Correct Incorrect

2. Focus on a single fact or idea in the item.

Poor:
T F Right to suffrage is given to citizens in a democratic country in order to
enjoy economic gains.

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


T F Citizens in a democratic society have the right of suffrage.

3. Avoid long sentences. The unnecessary long and wordy statements will not clearly
expressed the significant idea.

Poor:
T F Criterion-referenced tests are interpreted based on a standard that
determines whether students have reached an acceptable level or not.
Better:
T F Standards are used to interpret criterion referenced tests.

4. Avoid insignificant or trivial facts or words. Due to unnecessary facts the students
commit errors.

Poor:
T F Legumes, beans and nuts should be avoided by people who are suffering
from gout whether inherited or not from their parents.
Better:
T F Legumes, beans, and nuts should be avoided by people with gout.

5. Avoid negative statements.

Poor:
T F All European nations are not in favor of joining the European Union.
Better:
T F All European nations are in favor of joining the European Union.

6. Avoid inadvertent clues to the answer.

Poor:
T F Essay test are never easy to score.

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


Better:
T F Essay tests are difficult to score.

7. Avoid using vague adjectives and adverbs. Students interpret differently such adjectives
and adverbs as typically, usually, occasionally, quite, etc. It often becomes a test of
vocabulary when done.

Poor:
T F People from cold countries typically drink wine every day.

Better:
T F People from cold countries are fond of drinking wine.

Multiple-Choice Item

 Another selected-response item format is the multiple choice.


 According to McMillan (2007), The Multiple-choice item can assess
whether students can use reasoning as a skill similar to binary-choice
items and to use students‟ knowledge and skills in performing problem
solving, decision-making, or other reasoning task
 The construction of Multiple-choice item is not easy as binary-choice.

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


The table 8.14 illustrates the item structure of multiple-choice. As you can see in the first
column it says stimulus, the stimulus consist of a stem which contains the problem in the form of
a direct question or incomplete statement and the options which offers the possible answer and a
distracters. And on the second column, the response is selecting the correct answer or best
answer from the options.

Illustrative Items

1. Direct-question form (best answer version)

What form of government is rules by a Prime Minister?

A. Monarchy

B. Parliamentary

C. Presidential

D. Federal

2. Incomplete Statement form

Among the Asian countries, one which has a government with three branches is _______.

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


A. Japan

B. China

C. Philippines

D. Thailand

Writing a good multiple-choice items requires clarity in stating the problem in the stem and the
plausibility or attractiveness of distracters.

STEM

1. All the words of the stem should be relevant to the task. It means stating the problem
briefly yet clear, so that the student will understand what is expected to answer.

2. Stem-should be meaningful by itself and should fully contain the problem. This should
especially be observed when the stem uses an incomplete statement format. Consider this
stem:

The constitution is ______________.

This stem can be improved by changing its format to a direct question or adding more
information in the complete statement like:

What does the constitution of an organization provide? (Direct-question format)


The constitution of an organization provides. (Incomplete- statement format)

3. The stem should use a question with only one correct or clearly best answer. Ambiguity
sets in when the stem allows for more than one best answer. The student will likely based
their answer on personal perspective instead of facts.

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


DISTRACTERS

1. All distracters should appear plausible to uninformed test-takers, making the item
discriminating and therefore valid. This is the key to making the item discriminating and
therefore valid. The validity of the item suffers when there is a distracter that is obviously
correct as option D or obviously wrong as option B in the following item.

Example:

[Poor]

What is matter?

A. Everything that surrounds us.


B. All things bright and beautiful.
C. Things we see and hear.
D. Anything that occupies space and mass.

Table 8.15 Ways to Make Distracters Plausible (given by Miller, Linn, & Gronlund, 2009)

1. Use the students' most common 5. Use incorrect answers that are likely
errors. to result from student
2. Use important-sounding words (e.g. misunderstanding or carelessness
significant, accurate) that are relevant (e.g. forgets to convert feet to yards)
to the item stem. But do not overdo it! 6. Use distracters that are homogenous
Ex. and similar in content to the correct
What is a consequence of climate option (e.g. all are inventors)
change? Ex.
A. Economic reparation and Who invented the lightning rod?
improvement A. Thomas Edison
B. Transient loss of biodiversity B. Nikola Tesla
C. Diminishing water and hunger C. Benjamin Franklin*
crises D. Michael Faraday
D. Irreparable damage to 7. Use distracters that parallel in form

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


ecosystem* or are grammatically consistent with
3. Use words that have verbal the item stem.
associations with the item stem (e.g. Ex.
politician -> political) How long do plastic straws break
Ex. down?
Which of the following has been A. 100 years
Manny Pacquiao’s professional B. 200 years*
career for 26 years? C. 300 years
A. Politician D. 400 years
B. Boxer* 8. Make distracters similar to the correct
C. Philanthropist answer in length, vocabulary,
D. Preacher sentence structure, and complexity of
4. Use "textbook language” or other thought.
phraseology that has appearance of Ex.
truth. You are a psychiatrist in session with
Ex. a patient who tells you he thinks his
Tokhang mandates a house-to-house roommate is out to get him. The
visits and “persuasion of suspected patient is mildly schizophrenic. The
illegal drug personalities to stop their patient asks you if you can keep a
illegal drug activities.” Which right secret and before you can answer he
does it primarily violate? tells you he plans to kill the
A. Right to due process* roommate "when the time is right."
B. Right against self-incrimination What is the most appropriate course
C. Right against double jeopardy of action for you to take?
D. Right to liberty A. Attempt to discourage the patient
from his plan while respecting
confidentiality
B. Inform the medical director of
your practice of the situation and
let him handle it
C. Inform the patient’s roommate

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


that he is in danger
D. Inform the patient’s roommate
and the police that the roommate
is in danger*

Caution: Distracters should distract the uninformed, but they should not result in trick
questions that mislead the knowledgeable students (do not insert “not” in a correct
answer to make a distracter)

2. Randomly assign correct answers to alternative positions. Item writers have a


tendency to assign the correct answer to the third alternative as they run short of incorrect
alternatives. Students then who have been used to taking multiple choice tests choose
wily option “C” when guessing for greater chance of being correct. No deliberate order
should be followed in assigning the correct answers (e.g. ABCDABCD, AACCBBDD)
for ease in scoring. As much as possible have an equal number of correct answers
distributed randomly in each of the distracters.
3. Avoid using “all-of-the-above” or “none-of-the-above” as distracters. Item writers
think that using them adds difficulty to the item since it is a way to test reasoning ability.
However students, without much thinking, will tend to choose these “of-the-above”
distracters haphazardly when they see at least two distracters as correct or incorrect
without considering the remaining ones. When forced to come up with a fourth plausible
option and there seems to be none available except “all-of-the-above” or “none-of-the-
above,” do not make them as correct answer.

MATCHING ITEMS

Of the three general selected-response item formats, matching items appear different. It consists
of two parallel lists of words or phrases the students are tasked to pair. The first list which is to

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


be matched is referred to as "premises" while the other list which to choose its match based on a
kind of association is referred to as "options."

Table 8.16 Matching Type Item Structure

List of Premises List of Options


Words or phrases to be Alternatives or options from
matched or associated with which to select what will
an appropriate word. match the premise.

Illustrative Item 1

The first column describes events associated with Philippine presidents while the second column
gives their names. In the space provided, write the letter of the president that matches the
description.

Column A Column B
_______1. First president of the Republic a. Ramon Magsaysay
_______2. Declared martial law during his b. Corazon Aquino
term c. Gloria Macapagal-Arroyo
_______3. First president to resign from office d. Manuel L. Quezon
_______4. First woman president e. Fidel C. Ramos
_______5. Died in an airplane crash f. Emilio Aguinaldo
_______6. A uniformed man elected into g. Joseph Ejercito Estrada
office h. Manuel A. Roxas
i. Ferdinand Marcos
Illustrative Item 2 (for advanced level)

Column A contains theoretical postulations of how the universe came about. Match each one
with the name of the theory given in Column B. Indicate the appropriate letter to the left of the
number in Column A.

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


Column A Column B
_______1. Large-scale features of the universe a. Dark matter theory
do not change over time. b. Density wave theory
_______2. About 90% of the matter in the c. Superdense theory
universe does not interact with radiation. d. Infinity worlds‟ theory
_______3. The spiral arms in galaxies cannot e. Galactic rotation theory
be permanent condensations of matter. f. Inflationary universe theory
_______4. The planets describe closed orbits g. Planetisimal theory
about the Earth. h. Ptolemaic theory
_______5. Planets were formed from small i. Steady-state theory
solid bodies caused by eruptions of stars.

The two illustrative items exemplify the guidelines in constructing matching items (Kubiszyn
and Norich, 2010).

1. Keep the list of premises and the list of options homogenous or belonging to a
category.
In Sample 1, the premises are events associated with Philippine presidents while the
options are names of presidents. In Sample 2, Column A lists some theories in astronomy
about how the universe has evolved and Column B lists the names of the theories.
Homogeneity is a basic principle in matching items.
2. Keep the premises always in the first column and the options in the second column.
Since the premises are oftentimes descriptions of events, illustrations of principles,
functions or characteristics, they appear longer than the options which are most of the
times are names, categories, objects, and parts. The ordering of the two columns this way
saves reading time for the students since they will usually read one long premise once
and select the appropriate match from a list of short words. If ordered the opposite way,
the students will read a short word as the premise then read through long descriptions to
look for the correct answer. Especially for Sample 2, the students will normally read a

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


theoretical postulation first then logically go through the names of the theories given in
Column B. Imagine the time spent if opposite process is done.
3. Keep the lists in the two columns unequal in number.
Basic reason for this is to avoid guessing. The options in Column B are usually more than
the premises in Column A. If the two lists are equal in number, students can strategically
resort to wise elimination in finding the rest of the pairs. There are matching items
however, when the options are much less than the premises. This is recommended when
testing ability to classify. For instance, Column A will be a list of 10 animals which are to
be classified and Column B could just be 4 categories of mammals. With this format, it is
important to mention in the test directions that an option can be used more than once.
4. Test directions always describe the basis for matching.
“Match Column A with Column B” is a no-no in matching type. Describe clearly what is
to be found in two columns, how they are associated and how matching will be done.
Invalid scores of students could be due to extraneous factors like misinterpretation of
how matching is to be done, misunderstanding in using given options (e.g. using an
option only once when the teacher allows use of an option more than once) and limiting
number of items to be answered when there are few options given.
5. Keep the number of premises not more than eight (8) items as shown in the two
sample items.
Fatigue sets in where there are too many items in a set and again, test validity suffers. If
an item writer feels that there are many concepts to be tested, dividing them into sets is a
better strategy. It is also suggested that a set of matching items should appear on a page
only and not to be carried on to the next page. Frequently flipping the test papers just to
look for appropriate options require additional time.
6. Ambiguous lists should be avoided.
This is especially true in the preparation of options for the second column. There should
only be one option appropriately associated with a premise unless it is unequivocally
mentioned that an option could be used more than once as mentioned in #4. This often
occurs when matching events and places, or events and names, descriptions and
characters. For instance, in a description-character matching, a premise like “mean to

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


Cinderella” may carelessly list “step-mother” and “step-sister” as options which are both
correct. Either the premise is improved or one option removed.

It can be seen that matching type as a test format is used quite appropriately in assessing
knowledge outcomes particularly for recall of terminologies, classifications, and remembering
facts, concepts, principles, formulae, and associations. Its main advantage is its efficiency in
being able to test several concepts using the same format.

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


CHAPTER 9: IMPROVING A
CLASSROOM – BASED ASSESSMENT
TEST

Two approaches to undertake the item – improvement

- Judgmental Approach
- Empirical Approach

Judgmental Item – Improvement

- Makes use of human judgment in reviewing the items.


- The one who gives judgment are the teachers themselves.

What is teachers’ own review?

- It is always advisable for teachers to take a second look at the assessment tools
that they have devised for a specific purpose.
- A type of review through teacher

Five ways and guidelines suggested by Popham (2011) for the teachers to follow in
exercising judgment:

 Adherence to item – specific guidelines and general item – writing commandments –


this guideline must use by the teachers to check how good the items have been planned
and written particularly in their alignment to intended instructional outcomes.
 Contribution to score – based inference – the teacher examines if the expected scores
generated by the test can contribute to making valid inference about the learners.

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


 Accuracy of content – this review should especially be considered when tests have been
developed after a certain period of time. Changes that may occur due to new discoveries
or developments can redefine the test content of a summative test. If this happens, the
items or the key to correction may have to be revisited.
 Absence of content gaps – this review criterion is especially useful in strengthening the
score – based inference capability of the test. If the current tool misses out on important
content now prescribed by a new curriculum standard, the score will likely not give an
accurate description of what is expected to be assessed. The teacher always ensures that
the assessment tool matches what is currently required to be learned. This is a way to
check on the validity of the test.
 Fairness – the discussions on item – writing guidelines always give warning on
unintentionally favoring the uninformed students obtain higher scores. These are due
inadvertent grammatical clues, unattractive distracters, ambiguous problems and messy
test instructions. Sometimes, unfairness can happen because of due advantage received
by a particular group like those seated in front of the classroom or those coming from a
particular socio – economic level. Getting rid of faulty and biased items and writing clear
instructions definitely add to the fairness of the test.

What is a peer review?

- There are schools that encourage peer or collegial review of assessment


instruments among themselves. Time is provided for this activity and it has
almost always yielded good results for improving tests and performance – based
assessment tasks. During these teacher dyad or triad sessions, those teaching the
same subject area can openly review together the classroom tests and tasks they
have devised against some consensual criteria.

A. Do the items follow the specific and general guidelines in writing items especially on:
 Being aligned to instructional objectives?
 Making the problem clear and unambiguous?
 Providing plausible options?

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


 Avoiding unintentional clues?
 Having only one correct answer?
B. Are the items free from inaccurate content?
C. Are the items free from obsolete content?
D. Are the test instructions clearly written for students to follow?
E. Is the level of difficulty of the test appropriate to level of learners?
F. Is the test fair to all kinds of students?

What is a student review?

- Engagement of students in reviewing items has become a laudable practice for


improving classroom tests.
- The judgment is based on the students‟ experience in taking the test, their
impressions and reactions during the testing event.
- The process can be efficiently carried out through the use of a review
questionnaire.

Item – Improvement Questionnaire For Students (Popham, 2011)

1. If any of the items seemed confusing, which ones were they?

2. Did any items have more than one correct answer? If so, which ones?

3. Did any items have no correct answers? If so, which ones?

4. Were there words in any items that confused you? If so, which ones?

5. Were the directions for the test, or for particular subsections, unclear? If so, which ones?

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


EMPIRICAL APPROACH / EMPIRICALLY – BASED PROCEDURES

 Item – improvement using empirically – based methods is aimed at improving the quality
of an item using students‟ responses to the test.

 Test developers refer to this technical process as item analysis as it utilizes data obtained
separately for each item.

Normative – Referenced Test

- Two indices are related since the level of difficulty of an item contributes to its
discriminability.

Criterion – Referenced Test

- An item with a high difficulty index will not be considered as an “easy item” and
therefore a weak item, but rather an item that displays the capability of the
learners to perform the expected outcome.

Difficulty Index

- An item‟s difficulty index is obtained by calculating the p value (p) which is the
proportion of students answering the items correctly.

Discrimination Index

- Showing the relationship between the student‟s performance in an item and


his/her total performance in the test represented by the total score.
- An item statistics that can reveal useful information for improving an item.

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


Examples:

DIFFICULTY INDEX

P = R/T

P = Difficulty Index

R = Total number of students answering the item right.

T = Total number of students answering the item.

Example #1:

There were 45 students in the class who responded to item 1, and 30 answered it correctly.

Computation: p = 30/45

= 0.67

Intepretation: p value of 0.67.

Sixty – seven percent (67%) got the item right while 33% missed it.

Example #2:

In the same class, only 10 responded correctly in item 2.

Computation: p = 10/45

= 0.22

Intepretation: p value of 0.22.

Out of 45, only 10 or 22% got the item right while 35 or 78% missed it.

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


Remember:

For Normative – Referenced Test:

Between the two items, item 2 appears to be a much more difficult item since less than a fourth
of the class only was able to respond correctly.

For Criterion – Referenced Test:

The class shows much better performance in item 1 than in item 2. It is still a long way for many
to master item 2.

Discrimination Index

 Shows the difference that exists between the performances of those who scored high and
low in an item.

Nature of Discrimination directions

1. Positively Discriminating Item –the proportion of high scoring group is greater than the
lower scoring group
2. Negatively Discriminating item – the proportion of high scoring group is less than of the
low scoring group
3. Not Discriminating- the proportion of the high scoring group is equal to that of the low
scoring group

Calculation

D= Rᵤ/Tᵤ - Rı/ Tı

Where D is item discrimination index

Rᵤ = number of upper group getting the item correct

Tᵤ= number of upper group

Rı = number of lower group getting the item correct

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


Tı = number of lower group

Another Calculation ( Kubiszyn and Borich 2010)

D= Rᵤ - Rı
T
Where Rᵤ = number of upper group getting the item correct

Rı = number of lower group getting the item correct

T= number of either group

Another Calculation (Popham, 2011)

D= Pᵤ - Pı

Where Pᵤ is the p-value for upper group (Rᵤ/Tᵤ)

Pı is the p-value for lower group (Rı/ Tı)

Here are the steps to follow to obtain the proportions of the upper and lower groups
responding the item correctly.

a. Score the test papers using a key to correction to obtain the total scores of the students.
Maximum score is the total number of objective items.
b. Order the test papers from highest to lowest score.
c. Split the test papers into halves: high group and low group
d. Obtain the p-value for the Upper group and p-value for the lower group
e. Get the discrimination index by getting the difference between the p-values.

Guidelines to select the satisfactory items and what to do to improve the rest (Ebel &
Frisbie 1991)

Discrimination Index Item Evaluation


.40 and above Very good items
.30 - .39 Reasonably good items, but possibly subject to
improvement

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


.20 - .29 Marginal items, usually needing improvement
.19 and below Poor items, to be rejected or improved by
revision

Take Note: Items with negative discrimination indices, although significantly high, are subject
right away to revision if not deletion. With multiple-choice items, negative D is a forensic
evidence of errors in item writing. It suggests the possibility of:

 Wrong key – students selected a distracter which is the correct answer but not in the
answer key.
 Unclear option
 Ambiguous distractions
 Implausible keyed option which more informed students will not choose.

Example Table:

Total Questions
Student
Score (%) 1 2 3

Asif 90 ✔ 0 ✔

Sam 90 ✔ 0 ✔

Jill 80 0 0 ✔

Charlie 80 ✔ 0 ✔

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


Sonya 70 ✔ 0 ✔

Ruben 60 ✔ 0 0

Clay 60 ✔ 0 ✔

Kelley 50 ✔ ✔ 0

Justin 50 ✔ ✔ 0

Tonya 40 0 ✔ 0

What to do: Subtract the number of lower group who got the item correct from the number of
students in the upper group who got the item correct. Then, divide by the number of students in
each group

Number of students in the lower group that got the correct item in question 1 = 4

Number of students in the higher group that got the correct item in question 1 = 4

Number of students in each group = 5

Solution

4-4 / 5

0/5

=0

Discrimination Index = 0

Distracter Analysis

 This is another empirical procedure to discover areas for item-improvement. Utilizes an


analysis of the distribution of responses across the distracters. Especially when the

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


difficulty and discrimination index of the item seem to suggest its being candidate for
revision, distracter analysis can be a useful follow up.

 It can detect the differences in how the more able students respond to the distracters in a
multiple-choice item compared to how the less able ones do it.

 It can also provide an index on the plausibility of the alternatives, that is if they are
functioning as good distracters.

 Distracters not chosen at all especially by the uninformed students need to be revised to
increase attractiveness.

Example

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION


Sensitivity to Instruction Index

 Another empirical approach for reviewing test items to infer how sensitive an item has
been to instruction. It signifies a change in student‟s performance as a result of
instruction.

 The information is useful for criterion-referenced tests which aim at determining if


mastery learning has been attained after a designated
or prescribed instructional period.

 The question being addressed is a directional one


i.e. is student performance better after instruction is given.
In the context of item performance, Si will indicate if p-value obtained for the item in the
post test is greater than the p-value obtained in the pre-test.

Calculation:

Sensitivity to instruction {Si}= P(post) – P(pre)

Example:

Consider an item where in a class of 40, 80% answered it correctly in the post-test while only
10% did it the pre-test

Its p-value for the post test is .80 while for pre-test is .10, thus Si = .70

{Si}= P(post) – P(pre)

= .80 - .10

= .70

PAMANTASAN NG LUNGSOD NG MAYNILA COLLEGE OF EDUCATION

You might also like