Chapter 7 Interpreting Test Score

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Interpreting Scores and Norms 

Measures from achievement, aptitude, attitude, psychological scores do not have a


true-zero point. This means that in both theory and practice when a student misses every item on
achievement, aptitude, attitude, or psychological tests, we do not interpret this to mean the
student has no ability, knowledge, or skill in that area. For example, if a person answers none of
the items correctly on an intelligence test, we do not say that person has no intelligence. Another
example, if a student misses every question on a math subtest of some achievement test, we do
not say that student has no ability, knowledge, or skill in math. A third example, if a person gives
incorrect answers on all items of a test on anxiety, we do not interpret that to mean that the
person has no anxiety about anything ever. 

The scores from achievement, aptitude, attitude, psychological tests can not be compared directly
with each other unless the norming group is taken into consideration and the scale on which the
score is based. We addressed this issue in earlier modules on aptitude and achievement tests.

Methods of Interpreting Test Scores 

1. Criterion-Referenced Interpretation: Based on mastery of a specific set of skills 

Are the achievement domains clearly defined? 

Are there enough items for each skill tested? 

What is the difficulty level of the items? 

What type(s) of items are used? 

What is the match of items to objectives?

2. Norm-Referenced Interpretation: Based on comparison of individuals to clearly defined


groups (called norming groups) 
Are the test norms relevant? 
Are the test norms representative? 
Are the test norms up to date? 
Are the test norms comparable? 
Are the test norms adequately described?

Types of Test Scores and Defined Purpose 

Raw scores -- the number of items correct or the number of points earned; not of much use by
themselves 

Grade Equivalent scores --average score grade group in which student's raw score is average;
used to estimate or monitor growth 

Standard scores -- terms of standard distance of student's raw score from the mean (average) in
terms of standard deviations; used to monitor growth; better at reflecting reality than grade
equivalent scores 

Normal Curve Equivalent  -- a normalized standard score; used to avoid problems with grade
equivalent scores and used to describe group performance and to show growth over time 

Percentile Ranks -- student's relative position in a group in terms of the percentage of students
scoring lower than or equal to that student; used to determine relative areas of strengths and
weaknesses; can create profile analyses from these scores.

 Cautions in Interpreting Any Test Score 

1. A test score should be interpreted in terms of the specific test from which it was derived. 

2. A test score should be interpreted in light of all of the student's relevant characteristics. 

3. A test score should be interpreted according to the type of decision to be made. 

4. A test score should be interpreted as a band of scores rather than as a specific score. 
5. A test score should be verified by supplementary evidence. 

6. Do NOT interpret a grade equivalent score as an estimate of the grade where a student should
be placed. 

7. Do NOT assume that the units are equal at different parts of the scale. 

8. Do NOT assume that scores on different tests are comparable. 

9. Do NOT interpret extreme scores as dependable estimates of a student's performance.

Test publishers provide a variety of ways of presenting results. The figures in your text are just a
few of the presentations possible. 

Is this student's score for verbal reasoning: 

1. above average 

2. average 

3. below average 

Is this student's score for abstract reasoning: 

1. above average 

2. average 

3. below average  

Interpreting Scale Scores 

Scale scores vary from test to test and from grade to grade within the same test. The range,
standard deviations, and means vary by test, subtest, and grade. They are very often reported and
can be converted to cumulative frequency at midpoint which in turn can be converted to
percentile ranks which are much easier to interpret. 
Given a scale score and the number of students earning below that and the number of students
earning exactly that scale score the cumulative frequency at midpoint can be calculated. The
definition of cumulative frequency at midpoint is all the students who earned scale scores below
a given score plus one half of the students who earned that scale score. 

An example, if we know that 36 students earned scale scores lower than 400 and 6 students
earned scale scores of exactly 400, then we take one half of 6 and add that to 36, and we know
that the cumulative frequency at midpoint for a scale score of 400 is 40. If we then divide that by
the number of students who took the test, we have the percentile rank. Given that 50 students
took the test, we divide 40 by 50 and obtain a percentile rank of 80. Now we know that this
student has performed as well as or better than 80% of his/her peers. We would also say that this
student is average. 

You might also like