Measurement and Evaluation
Measurement and Evaluation
Measurement and Evaluation
Learning
EVALUATION
Outline:
Introduction
Definition of assessment and evaluation
Aim of student evaluation
Steps in student evaluation
The basic principles of assessment/ evaluation
Regulation of learning by the teacher
Types of evaluation
Qualities of a test
Characteristics of measurement instrument
Advantages and disadvantages of different types of tests
Introduction
Assessment and evaluation are essential
Types of evaluation
1- Formative evaluations:
It is an ongoing classroom
process that keeps students and
educators informed of students
progress toward program learning
objectives.
The main purpose of formative
evaluation is to improve instruction
and student learning.
2- Summative evaluations
It occurs most often at the end of a unit.
The teacher uses summative evaluation to
3- Diagnostic evaluation
It usually occurs at the beginning of the school
Principles of Evaluation
Evaluation should be
1. Based on clearly stated objectives
2. Comprehensive
3. Cooperative
4. Used Judiciously
5. Continuous and integral part of the teaching
learning process
Qualities of a test
Directly related to educational objectives
Realistic& practical
Concerned with important & useful matters
Comprehensive but brief
Precise& clear
QUALITIES OF A GOOD
MEASURING INSTRUMENT
Validity means the degree to which a test or measuring
instrument measures what it intends to measure. The
validity of a measuring instrument has to do with its
soundness, what the test measures its effectiveness,
and how well it could be applied.
For instance, to judge the validity of a performance test, it
is necessary to consider what kind of performance test
is supposed to measure and how well it manifests itself.
VALIDITY
Denotes the extent to which
an instrument is measuring
what it is supposed to
measure.
Criterion-Related Validity
A method for assessing the
validity of an instrument by
comparing its scores with another
criterion known already to be a
measure of the same trait or skill.
validity coefficient
Types of Validity
Content Validity. Content validity means the extent to
CONTENT VALIDITY
Content validity is described by the relevance of a
Content validity
Content validity is commonly used in evaluating
CONTENT VALIDITY
ILLUSTRATION
For instance, a teacher wishes to validate a
CONCURRENT VALIDITY
Concurrent validity is the degree to which
CONCURRENT VALIDITY
ILLUSTRATION
For example, a teacher wishes to validate a
PREDICTIVE VALIDITY
Predictive Validity is determined by showing
how well predictions made from the test are
confirmed by evidence gathered at some
subsequent time. The criterion measure
against this type of validity is important
because the outcome of the subject is
predicted.
PREDICTIVE VALIDITY
ILLUSTRATION
For instance, the teacher wants to estimate
CONSTRUCT VALIDITY
Construct validity of the test is the extent to
CONSTRUCT VALIDITY
The extent to which a test measures a
theoretical construct or attribute.
CONSTRUCT
Abstract concepts such as
intelligence, self-concept,
motivation, aggression and
creativity that can be observed by
some type of instrument.
ILLUSTRATION
For example, a teacher wishes to establish the
A tests construct
validity is often
assessed by its
convergent and
discriminant validity.
1.
2.
3.
4.
FACTORS AFFECTING
VALIDITY
Test-related factors
The criterion to which you
compare your instrument may
not be well enough established
Intervening events
Reliability
RELIABILITY
Reliability means the extent to which a test
RELIABILITY
The consistency of measurements
A RELIABLE
Produces similar
scores across
TEST
various conditions and
situations, including different
evaluators and testing
environments.
RELIABILITY COEFFICIENTS
The statistic for expressing
reliability.
Expresses the degree of
consistency in the measurement
of test scores.
Donoted by the letter r with two
identical subscripts (rxx)
RELIABILITY
For instance, Student C took Chemistry test
TEST-RETEST RELIABILITY
N3 N
Rs = Spearman rho
D2 = Sum of the squared difference between ranks
N total number of cases
90
40
70
30
2.0
7.5
5.5
30.25
43
43
31
31
13.0
12.5
0.5
0.25
84
48
79
31
6.5
3.0
3.5
12.25
86
55
70
43
4.5
7.5
-3.0
9.00
55
75
43
43
11.0
10.5
0.5
0.25
77
77
70
70
8.5
7.5
1.0
1.0
84
77
75
70
6.5
4.5
2.0
4.00
91
84
88
70
1.0
1.0
0.0
0.00
40
84
31
70
14.0
12.5
1.5
2.25
10
75
86
70
75
10.0
7.5
2.5
6.25
11
86
86
80
75
4.5
2.0
2.5
6.25
12
89
89
75
79
3.0
4.5
-1.5
2.25
13
48
90
30
80
12.0
14.0
2.0
4.0
14
77
91
43
88
8.5
10.5
-2.5
4.0
TOTAL
D2 = 82.00
= 1 - 6 (82)
(14)3 14
= 1 492
2744 14
= 1 - 492/2730
= 1- 0.18021978
= 0.82(high relationship)
PARALLEL-FORMS METHOD.
Parrallel-forms method. Parallel or equivalent
ALTERNATE FORMS
RELIABILITY
Also known as equivalent forms reliability
SPLIT-HALF METHOD
the test items are divided into two halves. The common
procedure is to divide a test into odd and even items.
The two halves of the test must be similar but not
identical in content, number of items, difficulty, means
and standard deviations. Each student obtain two
scores, one on the odd and the other on the even items
in the same test. The scores obtained in the two halves
are correlated. The result is a reliability coefficient for a
half test. Since the reliability holds only for a half test,
the reliability coefficient for a whole test may be
estimated by using the Spearman Brown formula.
Split-Half Reliability
Sometimes referred to as
internal consistency
Indicates that subjects scores
on some trials consistently
match their scores on other
trials
Formula
rwt
= 2 ( rht)
1 + rht
Scores
Students
X odd
Ranks
Y Even
Difference
Rx
Ry
D2
23
30
7.5
1.5
2.25
25
24
7.5
9.5
-2.0
4.00
27
30
7.5
-1.5
2.25
25
35
7.5
6.0
1.5
2.25
50
51
4.0
-2.0
4.00
38
60
3.0
9.00
10
55
55
1.0
2.5
-1.5
2.25
For instance,
a test
is administered
to the students
as0.0pilot 0.00
4
35
40
5
5
sample to5 test the
reliability
coefficient
of the
odd and0.5
even0.25
48
55
3
2.5
items. 6
21
24
10
9.5
0.5
0.25
Total
26.50
rht = .84
rwt = .91 (very high reliability)
N1
SD 2
N-1
10
11
12
13
14
pi qi
piqi
12
.86
.14
.1204
12
.86
.14
.1204
11
.79
.12
.1659
10
.71
.29
.1059
10
.71
.29
.2059
10
.7
.29
.2059
.64
.36
.2304
.57
.43
.2451
.57
.43
..2451
1
0
.29
.71
.2059
10
10
10
1.9509
INTERRATER RELIABILITY
Involves having two raters independently
observe and record specified behaviors,
such as hitting, crying, yelling, and getting
out of the seat, during the same time period
TARGET
BEHAVIOR
A specific
behavior the observer is
looking to record
OBTAINED SCORE
The score you get when you administer a
test
Consists of two parts: the true score and
the error score
STANDARD ERROR of
MEASUREMENT (SEM)
FACTORS AFFECTING
RELIABILITY
1.
2.
3.
4.
5.
Test length
Test-retest interval
Variability of scores
Guessing
Variation within the test situation
GROUP.
MODERATE ITEM DIFFICULTY.
OBJECTIVE SCORING.
LIMITED TIME
USABILITY
Usability means the degree to which the
1- Oral examinations
Disadvantages
1. Lack standardization.
2. Lack objectivity and reproducibility of results.
3. Permit favoritism and possible abuse of the
personal contact.
4. Suffer from undue influence of irrelevant factors.
5. Suffer from shortage of trained examiners to
administer the examination.
6. Are excessively costly in terms of professional time
in relation to the limited value of the information it
yields.
2- Practical examinations
Advantages
1. Provide opportunity to test in realistic setting skills involving
all the senses while the examiner observes and checks
performance.
2. Provide opportunity to confront the candidate with
problems he has not met before both in the laboratory and
at the bedside, to test his investigative ability as opposed
to his ability to apply ready-made "recipes".
3. Provide opportunity to observe and test attitudes and
responsiveness to a complex situation (videotape
recording).
4. Provide opportunity to test the ability to communicate
under Pressure, to discriminate between important and
trivial issues, to arrange the data in a final form.
2- Practical examinations
Disadvantages
1. Lack standardized conditions in laboratory
experiments using animals, in surveys in the
community or in bedside examinations with patients of
varying degrees of cooperativeness.
2. Lack objectivity and suffer from intrusion or
irrelevant factors.
3. Are of limited feasibility for large groups.
4. Entail difficulties in arranging for examiners to
observe candidates demonstrating the skills to be
tested.
3- Essay examinations
Advantages
1. Provide candidate with opportunity to demonstrate
his knowledge and his ability to organize ideas and
express them effectively
Disadvantages
1. Limit severely the area of the student's total work
that can be sampled.
2. Lack objectivity.
3. Provide little useful feedback.
4. Take a long time to score
4- Multiple-choice questions
Advantages
1. Ensure objectivity, reliability and validity; preparation of
questions with colleagues provides constructive criticism.
2. Increase significantly the range and variety of facts that
can be sampled in a given time.
3. Provide precise and unambiguous measurement of the
higher intellectual processes.
4. Provide detailed feedback for both student and teachers.
5. Are easy and rapid to score.
4- Multiple-choice questions
Disadvantages
1. Take a long time to construct in order to avoid
arbitrary and ambiguous questions.
2. Also require careful preparation to avoid
preponderance of questions testing only recall.
3. Provide cues that do not exist in practice.
4. Are "costly" where number of students is
small.