Educ 6 M2-Midterm

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

MODULE 2: PRINCIPLES OF HIGH-QUALITY ASSESSMENT

INTRODUCTION

This module will introduce you to the principles and characteristics of a high-quality
assessment. The first part will discuss the productive and unproductive uses of tests. The second
part will focus on the classifications of tests and other types.

MODULE LEARNING OUTCOMES

In this module, you will be able to:


1. identify what constitutes high-quality assessments;
2. list down the productive and unproductive uses of tests; and
3. classify the various types of tests.

PRE-ASSESSMENT
Instruction: Complete the table below with the desired information.

What I Know about this What I want to know about What do I Learned from this
lesson? this lesson? lesson?
MODULE MAP

CONTENT

ENGAGE ENGAGING ON THE ROLES OF ASSESSMENT IN INSTRUCTIONAL DECISIONS

Activity No. 1 – DID YOU KNOW?


Instruction: Write on the rectangular box your thoughts about what constitutes a high-quality
assessment.

EXPLORE DELVING ON THE PRINCIPLES OF HIGH-QUALITY ASSESSMENT

Activity No. 2 – Read Me!


Instruction: Closely read the text below.

Characteristics of High-Quality Assessments


High-quality assessments provide results that demonstrate and improve targeted student learning.
They also convey instructional decision making. To ensure the quality of any test, the following criteria must
be considered:

1. Clear and Appropriate Learning Targets

When designing good assessment, start by asking if the learning targets are on the right level of
difficulty to be able to motivate students and if there is an adequate balance among the different types of
learning targets.

A learning target is a clear description of what students know and able are to do. Learning targets are
categorized by Stiggins and Conklin (1992) into five.

a. Knowledge learning target is the ability of the student master a substantive subject matter.

b. Reasoning learning target is the ability to use knowledge and to solve problems.

c. Skill learning target is the ability to demonstrate achievement related skills like conducting
experiments, playing basketball, and operating computers.

d. Product learning target is the ability to create achievement related products such as written
reports, oral presentations, and art products.

e. Affective learning target is the attainment of affective traits such as attitudes, values, interests,
and self-efficacy.

2. Appropriateness of Assessment Methods

Once the learning targets have been identified, match them with their corresponding methods by
considering the strengths of various methods in measuring different targets.

Table 2.1: Matching Learning Targets with Assessment Methods


ASSESSMENT METHODS
Target Objective Essay Performance-Based Oral Question Observation Self-Report
Knowledge 5 4 3 4 3 2
Reasoning 2 5 4 4 2 2
Skills 1 3 5 2 5 3
Products 1 1 5 2 4 4
Affect 1 2 4 4 4 5

3. Validity

This refers to the extent to which the test serves its purpose or the efficiency with which it intends to
measure. Validity is a characteristic that pertains to the appropriateness of the inferences, uses, and results
of the test or any other method utilized to gather data.

There are factors that influence the validity of the test; namely, appropriateness of test items,
directions, reading vocabulary and sentence structures, pattern of answers, and arrangement of items.

a. How Validity is Determined

Validity is always determined by professional judgment. However, there are different types of
evidence to use in determining validity. The following major sources of information can be used to
establish validity:

i. Content-related validity determines the extent of which the assessment is the


representative of the domain of interest. Once the content domain is specified, review the test items
to be assured that there is a match between the intended inferences and what is on the test. A test
blueprint or table of specification will help further delineate what targets should be assessed and what
is important from the content domain.
ii. Criterion-related validity determines the relationship between an assessment and another
measure of the same trait. It provides validity by relating an assessment to some valued measure
(criterion) that can either provide an estimate of current performance (concurrent criterion-related
evidence) or predict future performance (predictive criterion-related evidence).

iii. Construct-related validity determines which assessment is a meaningful measure of an


unobservable trait or characteristic like intelligence, reading comprehension, honesty, motivation,
attitude, learning style, and anxiety.

iv. Face validity is determined on the basis of the appearance of an assessment, whether
based on the superficial examination of the test, there seems to be a reasonable measure of the
objectives of a domain.

v. Instructional-related validity determines to what extent the domain of content in the test is
taught in class.

b. Test Validity Enhancers

The following are suggestions for enhancing the validity of classroom assessments:

i. Prepare a table of specifications (TOS).

ii. Construct appropriate test items.

iii. Formulate directions that are brief, clear, and concise.

iv. Consider the reading vocabulary of the examinees. The test should not be made up of
jargons.

v. Make the sentence structure of your test items simple.

vi. Never have an identifiable pattern of answers.

vii. Arrange the test items from easy to difficult.

viii. Provide adequate time for student to complete the assessment.

ix. Use different methods to assess the same thing.

x. Use the test only for intended purposes.

4. Reliability

This refers to the consistency with which a student may be expected to perform on a given test. It
means the extent to which a test is dependable, self-consistent, and stable.

There are factors that affect test reliability. These include the (1) scorer's inconsistency because of
his/her subjectivity, (2) limited sampling because of incidental inclusion and accidental exclusion of some
materials in the test, (3) changes in the individual examinee himself/herself and his/her instability during the
examination, and (4) testing environment.

a. How Reliability is Determined

There are various ways of establishing test reliability. These are the length of the test, difficulty of the
test, and objectivity of the scorer. There are also four methods in estimating the reliability of a good
measuring instrument.
i. Test-Retest Method or Test of Stability. The same measuring instrument is administered to the
same group of subjects. The scores of the first and second administrations of the test are determined
by correlation coefficient. The limitations of this method are: However, memory effects may operate
when the time interval is short. Likewise, factors such as unlearning and forgetting may occur when
the time interval is long resulting in low correlation of the test. Another limitation of the method is that
other varying environmental conditions may affect the correlation of the test regardless of the time
interval separating the two administrations.

ii. Parallel-Forms Method or Test of Equivalence. Parallel or equivalent forms of a test may be
administered to the group of subjects and the paired observations correlated. The two forms of the
test must be constructed in a manner that the content, type of item, difficulty, instructions for
administration, and several others, should be similar but not identical

iii. Split-Half Method. The test in this method may only be administered once, but the test items are
divided into two halves. The common procedure is to divide a test into odd and even items. The two
halves of the test must be similar but not identical in content, difficulty, means and standard
deviations.

iv. Assessment of Internal Consistency Method. This method is used with psychological tests,
which are constructed as dichotomously scored items. The testee either passes or fails in an item.
The method of obtaining reliability coefficients in this method is determined by the Kuder Richardson
formula.

b. The Concept of Error in Assessment

The concept of error in assessment is critical to the understanding of reliability. Conceptually,


whenever something is assessed, an observed score or result is produced. This observed score is the
product of what the true score or real ability or skill is plus some degree of error.

Observed Score = True Score + Error

Thus, an observed score can be higher or lower than the true score, depending on the nature of error.
The sources of error are reflected in Table 2.2

Table 2.2: SOURCES OF ERROR


Internal Error External Error
Health Directions
Mood Luck
Motivation Item ambiguity
Test-taking skills Heat in the room
Anxiety Lighting
Fatigue Sample of items
General ability Observer differences and bias
Test interpretation and scoring

c. Test Reliability Enhancers

The following should be considered in enhancing the reliability of classroom assessments:

i. Use a sufficient number of items or tasks. A longer test is more reliable.

ii. Use independent raters or observers who can provide similar scores to the same performances.

iii. Make sure the assessment procedures and scoring are objective.

iv. Continue the assessment until the results are consistent.

v. Eliminate or reduce the influence of extraneous events or factors.

vi. Assess the difficulty level of the test.


vii. Use shorter assessments more frequently rather than a few long assessments.

5. Fairness

This pertains to the intent that each question should be made as clear as possible to the examinees
and the test is absent of any biases. An example of a bias in an intelligence test is an item about a person or
object that has not been part of the cultural and educational context of the test taker. In mathematical tests
for instance, the reading difficulty level of an item can be a source of unfairness. Identified elements of
fairness are the student's knowledge of learning targets before instruction, the opportunity to learn, the
attainment of pre-requisite knowledge and skills, unbiased assessment tasks and procedures, and teachers
who avoid stereotypes.

6. Positive Consequences

These enhance the overall quality of assessment, particularly the effect of assessments on the
students' motivation and study habits.

7. Practicality and Efficiency

Assessments need to take into consideration the teacher's familiarity with the method, the time
required, the complexity of administration, the ease of scoring and interpretation, and the cost to be able to
determine an assessment's practicality and efficiency. Administrability requires that a test must be
administered with ease. clarity, and uniformity. Directions must be specific so that students and teachers will
understand what they must do exactly. Scorability demands that a good test should be easy to score. Test
results should readily be available to both students and teachers for remedial and follow-up measures.

Productive Uses of Tests

1. Learning Analysis. Tests are used to identify the reasons or causes why students do not leam and the
solutions to help them learn. Ideally, a test should be designed to determine what students do not know so
that the teachers can take appropriate actions.

2. Improvement of Curriculum. Poor performance in a test may indicate that the teacher is not explaining
the material effectively, the textbook is not clear, the students are not properly taught, and the students do not
see the meaningfulness of the materials. When only a few students have difficulties, the teacher can address
them separately and extend special help. If the entire class does poorly, the curriculum needs to be revised
or special units need to be developed for the class to continue.

3. Improvement of Teacher. In a reliable grading system, the class average is the grade the teacher has
earned.

4. Improvement of Instructional Materials. Tests measure how effective instructional materials are in
bringing about intended changes.

5. Individualization. Effective tests always indicate differences in students' learning. These can serve as
bases for individual help.

6. Selection. When enrollment opportunity or any other opportunity is limited, a test can be used to screen
those who are more qualified.

7. Placement. Tests can be used to determine to which category a student belongs.

8. Guidance and Counseling. Results from appropriate tests, particularly standardized tests, can help
teachers and counselors guide students in assessing future academic and career possibilities.
9. Research. Tests can be feedback tools to find effective methods of teaching and learn more about
students, their interests, goals and achievements.

10. Selling and Interpreting the School to the Community. Effective tests help the community understand
what the students are learning, since test items are representative of the content of instruction. Tests can
also be used to diagnose general schoolwide weaknesses and strengths that require community or
government support.

11. Identification of Exceptional Children. Tests can reveal exceptional students inside the classroom.
More often than not, these students are overlooked and left unattended.

12. Evaluation of Learning Program. Ideally, tests should evaluate the effectiveness of each element in a
learning program, not just blanket the information of the total learning environment.

Unproductive Uses of Tests

1. Grading. Tests should not be used as the only determinants in grading a student. Most tests do not
accurately reflect a student's performance or true abilities. Poor performance on a certain task may not only
indicate failure but lack or absence of the needed foundations as well.

2. Labeling. It is often a serious disservice to label a student, even if the label is positive. Negative labels
may lead the students to believe the label and act accordingly. Positive labels, on the other hand, may lead
the students to underachieve or avoid standing out as different or become overconfident and not exert effort
anymore.

3. Threatening. Tests lose their validity when used as disciplinary measures.

4. Unannounced Testing. Surprise tests are generally not recommended. More often than not, they are the
scapegoats of teachers who are unprepared, upset by an unruly class or reprimanded by superiors. Studies
reveal that students perform at a slightly higher level when tests are announced; unannounced tests create
anxiety on the part of the students, particularly those who are already fearful of tests; unannounced tests do
not give students the adequate time to prepare, and surprise tests do not promote efficient learning or higher
achievement.

5. Ridiculing. This means using tests to deride students. Tracking Students are grouped according to
deficiencies as revealed by tests without continuous reevaluation, thus locking them into categories.

6. Allocating Funds. Some schools exploit tests to solicit for funding.

Classifications of Tests

Throughout the years, psychologists and educators have cooperatively produced new and better tests
and scales that measure the students' performance with greater accuracy. These tests may be classified
according

1. Administration

a. Individual - given orally and requires the examinees' constant attention since the manner of
answering may be as important as the score. An example of this is the Wechsler Adult Intelligence
Scale, one of the three individually administered intelligence scales. Another is a PowerPoint
presentation used as a performance test in a speech class.

b. Group - for measuring cognitive skills to measure achievement. Most tests in schools are
considered group tests where different test takers can take the tests as a group.

2. Scoring
a. Objective - independent scorers agree on the number of points the answer should receive, e.g.,
multiple choice and true or false.

b. Subjective - answers can be scored through various ways. These are then given different values
by scorers, e.g., essays and performance tests.

3. Sort of Response being Emphasized

a. Power - allows examinees a generous time limit to be able to answer every item. The questions
are difficult and this difficulty is what is emphasized
b. Speed - with severely limited time constraints but the items are easy and only a few examinees are
expected to make errors.

4. Types of Response the Examinees must Make

a. Performance - requires students to perform a task. This is usually administered individually so that
the examiner can count the errors and measure the time the examinee has performed in each task

b. Paper-and-pencil- examinees are asked to write on paper.

5. What is Measured

a. Sample - limited representative test designed to measure the total behavior of the examinee,
although no test can exhaustively measure all the knowledge of an individual.

b. Sign test -diagnostic test designed to obtain diagnostic signs suggest that some form of
remediation is needed.

6. Nature of the Groups being Compared

a. Teacher - made test for use within the classroom and contains the subject being taught by the
same teacher who constructed the test.

b. Standardized test - constructed by test specialists working with curriculum experts and teachers.

Other Types of Tests

1. Mastery tests measure the level of learning of a given set of materials and the level attained.

2. Discriminatory tests distinguish the differences between students or groups of students. It indicates the
areas where students need help.

3. Recognition tests require students to choose the right answer from a given set of responses.

4. Recall tests require students to supply the correct answer from their memory.

5. Specific recall tests require short responses that are fairly objective.

6. Free recall tests require students to construct their own complex responses. There are no right answers
but a given answer might be better than the other.

7. Maximum performance tests require students to obtain the best score possible.

8. Typical performance tests measure the typical or usual or average performance.

9. Written tests depend on the ability of the students to understand, read and write

10. Oral examinations depend on the examinees' ability to speak. Logic is also required.

11. Language tests require instructions and questions to be presented in words.


12. Non-language tests are administered by means of pantomime. painting or signs and symbols, e.g.,
Raven's Progressive Matrices or the Abstract Reasoning Tests.

13. Structured tests have very specific, well-defined instructions and expected outcomes.

14. Projective tests present ambiguous stimulus or questions designed to elicit highly individualized
responses.

15. Product tests emphasize only the final answer.

16. Process tests focus on how the examinees attack, solve, or work out a problem

17. External reports are tests where a ratee is evaluated by another person.

18. Internal reports are self-evaluation.

19. Open book tests depend on one's understanding and ability to express one's ideas and evaluate
concepts.

20. Closed book tests depend heavily on the memory of the examinees.

21. Non-learning format tests determine how much information the students know.

22. Learning format tests require the students to apply previously learned materials

23. Convergent format tests purposely lead the examinees to one best answer.

24. Divergent format tests lead the examinees to several possible answers.

25. Scale measurements distribute ratings along a continuum.

26. Test measurements refer to the items being dichotomous or either

right or wrong, but not both.

27. Pretests measure how much is known about a material before it is presented

28. Posttests measure how much has been learned after a learning material has been given

29. Sociometric reveals the interrelationship among members or the social structure of a group

30. Anecdotal records reveal episodes of behavior that may indicate a profile of the students.

Table 2.3: Comparison between teacher-made tests and standardized tests


Characteristic Teacher-Made Test Standardized Test
Directions for Usually, no uniform directions are Specific instructions standardize the
Administration and specified. administration and scoring procedures.
Scoring
Sampling Content Both content and sampling are Content is determined by curriculum and
determined by the classroom subject matter experts. It involves intensive
teacher. investigations of existing syllabi, textbooks,
and programs. Sampling of content is done
systematically.
Construction May be hurriedly done because of It uses meticulous construction procedures
time constraints; often no test that include constructing objectives and test
blueprints, item tryouts, item blueprints, employing item tryouts, item
analysis or revision; quality of test analysis, and item revisions.
may be quite poor.
Norms Only local classroom norms are In addition to local norms, standardized tests
available. typically make available national, school
district, and school building norms.
Purpose and use Best suited for measuring particular Best suited for measuring broader curriculum
objectives set by the teacher and for objectives and for interclass, school, and
intraclass comparisons. national comparisons.

EXPLAIN EXPLAINING PRINCIPLES OF HIGH-QUALITY ASSESSMENT

Activity No. 3- Let Me Share!


Instruction: From the concepts presented above, explain why validity implies reliability but not the
transverse. You may share your answers during the online synchronous class schedule.
_______________________________________________________________________________________
_________________________________________________________________________________
___________________________________________________________________________
_________________________________________________________________________________
_________________________________________________________________________________
_________________________________________________________________________________

EXTEND EXTENDING ON THE PRINCIPLES OF HIGH-QUALITY ASSESSMENT

Activity No. 4- Let’s Generate!


Instruction: Generate some other qualities that you believe contribute to making assessments. Write your
answer on the box provided.

EVALUATE EVALUATING THE PRINCIPLES OF HIGH-QUALITY ASSESSMENT

Activity No. 5- List Down!


Instruction: List down your personal experiences of unfair assessments. Write your answer on the box
provided.
TOPIC SUMMARY

In this module, you learned that:


● To ensure the quality of any test, the following criteria must be considered:
1. Clear and appropriate learning targets
2. Appropriateness of assessment tools
3. Validity
4. Reliability
5. Fairness
6. Positive consequences
7. Practicality and efficiency
● The following are productive uses of tests:
1. Learning analysis
2. Improvement of curriculum
3. Improvement of teacher
4. Improvement of instructional materials
5. individualization
6. selection
7. placement
8. Guidance and counseling
9. Research
10. Selling and interpreting the school to the community
11. Identification of Exceptional children
12. Evaluation of learning program
● The following are unproductive uses of tests:
1. Grading
2. Labeling
3. Threatening
4. Unannounced Testing
5. Ridiculing
6. Tracking
● The following are classifications of tests:
1. Administration
2. Scoring
3. Sort of response being emphasized
4. types of response the examinees must make
5. what is measured
6. nature of the groups being compared
● There are about thirty (30) other types of tests.

POST-ASSESSMENT

Instruction: Complete the table below with the desired information.

What I Know about this What I want to know about What do I Learned from this
lesson? this lesson? lesson?
REFERENCES

● Airasian, P. W. 1997. Classroom Assessment. NY: McGraw Hill.


● Angelo, Thomas and K. Patricia Cross, 2007. Classroom Assessment Techniques, A Handbook for
College Teachers, 2nd Ed. New York: Teachers, College Press.
● Black and William, 1998. Improving Evaluation Forms to Produce Better Course Design. Performance
and Instruction, 35(1). Boston: Kluwer Academic
● Burns, E. et al. (n. d.) Formative Tests. http://www.ed.gov/pubs/ formative/ OR/ Consumer Guides
perfasse.html
● Brown, H. 2002 Objective Testing in Education and Training. London: Pitman Education Library
● Cronbach, L.J. 1990. Five Perspectives on the Validity of Argument. In H. Wainer and H.I. Braun
(Eds.) Test Validity Hillsdale, NJ: Lawrence Earlbaum.
● Cronbah, L. 1. 1990. Essentials of Psychological Testing. 5th ed. New York: Harper and Row.
● Fuchs. H. 2008. Assessment of Learning http://www.eric.
ed.gov/contentdelivery/servlet/ERICServlet?accno-ED376695
● Fuchs, H. 1994 Mastery Learning http://www.kde
state.ky.us/oapd/curric/publications/PerformanceEvent/ TOC htm
● Guskey, T. (2007/2008). The Rest of the Story Educational Leadership, 65(4). Washington DC
Bookshelf (Editorial Projects in Education). Gronlund, N.E. 1988. How to Make Achievement Tests and
Assessments. Boston: Allyn and Bacon
● Gronlund, N.E. 1985. Measurement and Evaluation in Teaching. New York: MacMillan Publishing
Company, Inc.
● Johnson and Johnson. 2002 Standards for Educational and Psychological Testing. Washington, DC:
Authors
● Kemp, Morrison, and Ross 1998. Formative Tests: Objective Testing. London. Morrison Ubooks,
University of London Press.
● Linn, Robert L. 1992 Measurement in Education Encyclopedia of Educational Research Vol 3. Upper
Saddle River, New Jersey: Prentice Hall
● Linn, Robert L. and Gronlund, Norman E. 2000. Measurement and Assessment in Teaching. Upper
Saddle River, New Jersey: Prentice Hall.
● McTigle, U. 2006. How to Assess Student Learning. Encyclopedia of Educational Research. Vol 8.
Upper Saddle River, New Jersey: Prentice Hall.
● Popham, 2000 Formative Assessment ttp://74.6.146.127 /search/cache?ei=UTF
8&p-bias-assessment&y
● Ross-Gordon, I, 2009. Introduction to Observation Forms and Guidelines for Practice.
http://www.tcg.org/tools/education/teams/observation.cfm
● Scriven, H. 1991. Thinking processes. Alternatives to standardized educational assessment. ERIC
Digest [Online] http://www.eric.ed.gov/contentdelivery/ servlet/ERICServlet?accno-ED312773
● Tomlinson I. and McTigle, U. 2006. Diagnostic Tests. http://www.english-zone.com/
study/essays.html.

You might also like