Assessment 2 - FOR 4TH YEAR COED
Assessment 2 - FOR 4TH YEAR COED
Assessment 2 - FOR 4TH YEAR COED
WHAT TO EXPECT
COURSE TITLE: PROFESSIONAL EDUCATION: ASSESSMENT OF LEARNING 2
COURSE DESCRIPTION: This course focuses on the principles, development, and utilization
of alternative forms of assessment in measuring authentic learning. It emphasizes on
how to assess process-and product-oriented learning outcomes as well as affective
learning. Students will experience how to develop rubrics and other assessment tools for
performance-based and product-based assessment.
LET Competencies:
1. Apply principles in constructing and interpreting traditional forms of assessment
2. Utilized processed data and results in reporting and interpreting learner’s performance to
improve teaching and learning
3. Demonstrate skills in the use of techniques and tools in assessing affective learning
PERFORMANCE-BASED ASSESSMENT .
Performance-Based Assessment is a process of gathering information about student's learning
through actual demonstration of essential and observable skills and creation of products that
are grounded in real world contexts and constraints. It is an assessment that is open to many
possible answers and judged using multiple criteria or standards of excellence that are pre-
specified and public.
Reasons for Using Performance-Based Assessment
Dissatisfaction of the limited information obtained from selected-response test.
Influence of cognitive psychology, which demands not only for the learning of declarative but
also for procedural knowledge.
Negative impact of conventional tests e.g., high-stake assessment, teaching for the test
It is appropriate in experiential, discovery-based, integrated, and problem-based
learning approaches.
Types of Performance-based Task
1. Demonstration-type- this is a task that requires no product
Examples: constructing a building, cooking demonstrations, entertaining tourists, teamwork,
Presentations
2. Creation-type - this is a task that requires tangible products.
Examples: project plan, research paper, project flyers
Methods of Performance-based Assessment
1.
Written-open ended-a written prompt is provided
Formats: Essays, open-ended test
1
2. Behavior-based - utilizes direct observations of behaviors in situations or simulated contexts
Formats: structured (a .specific focus of observation is set at once) and unstructured
(anything observed is recorded or analyzed)
3. Interview-based - examinees 'respond in one-to-one conference setting with the examiner to
demonstrate mastery of the skills
Formats: structured (interview questions are set at once) and unstructured (interview
questions depend on the flow of conversation)
4. Product-based- examinees create a work sample or a product utilizing the skills/abilities.
Formats: restricted (products of the same objective are the same for all students)
and extended (students vary in their products for the same objective)
5. Portfolio-based - collections of works that are systematically gathered to serve many
Purposes
How to Assess a Performance
1. Identify the competency that has to be demonstrated by the students with or
without a product.
2. Describe the task to be performed by the students either individually or as a group, the
resources needed, time allotment and other requirements to be able to assess the focused
competency.
7 Criteria in Selecting a Good Performance Assessment Task /Burke, 1999)
■Generalizability - the likelihood that the students’ performance on the task will generalize the
comparable tasks.
■Authenticity-The task is similar to what the students might encounter in the
real world as opposed to encountering only in the school.
■Multiple Foci -The task measures multiple instructional outcomes.
■Teachability - The task allows one to master the skill that one should be proficient in.
■Feasibility - The task is realistically implementable in relation to its cost,
. space, time, and equipment requirements.
■Scorability-The task can be reliably and accurately evaluated.
■Fairness-The task is fair to all the students regardless of their social status or gender
3. Develop a scoring rubric reflecting the criteria, levels of performance and the scores.
PERFORMANCE AND AUTHENTIC ASSESSMENTS
Specific behaviors or behavioural outcomes are to be observed
When To Use Possibility of judging the appropriateness of students’ actions
A process or outcome cannot be directly measured by paper-&-pencil tests
Allow evaluation of complex skills which are difficult to assess using written
tests
Advantages
Positive effect on instruction and learning
Can be used to evaluate both the process and the product
2
Time-consuming to administer, develop, and score
Limitations Subjectivity in scoring
Inconsistencies in performance on alternative skills
PORTFOLIO ASSESSMENT
Portfolio Assessment is also an alternative to pen-and-paper objective test It is a purposeful, ongoing,
dynamic, and collaborative process of gathering multiple indicators of the learner's growth and
development. Portfolio assessment is also performance-based but more authentic than any
performance-based task.
Reasons for Using Portfolio Assessment
Buike (1999) actually recognizes portfolio as another type of assessment and is considered
authentic because of the following reasons:
It tests what is realty happening in the classroom.
It offers multiple indicators of students' progress.
It gives the students the responsibility of their own teaming.
It offers opportunities for students to document reflections of their learning.
It demonstrates what the students know in ways that encompass their personal learning styles
and multiple intelligences.
It offers teachers new role in the assessment process.
It allows teachers to reflect on the effectiveness of their instruction.
It provides teachers freedom of gaining insights into the student's development or achievement
over a period of time.
Principles Underlying Portfolio Assessment
There are three underlying principles of portfolio assessment content, learning, and equity principles.
1. Content principle suggests that portfolios should reflect the subject matter that is important for
the students to learn.
2. Learning principle suggests that portfolios should enable the students to become active and
thoughtful learners.
3. Equity principle explains that portfolios should aflow students to demonstrate their learning styles
and multiple intelligences.
Types of Portfolios
Portfolios could come in three types: working, show, or documentary.
1. The working portfolio is a collection of a student's day-to-day works which reflect
his/her learning.
2. The show portfolio is a collection of a student's best works.
3. The documentary portfolio Is a combination of a working and a show portfolio
Characteristics of Portfolio:
1. Adaptable to individualized instructional goals
2. Focus on assessment of products
3
3. Identify students’ strengths rather than weaknesses
4. Actively involve students in the evaluation process
5. Communicate student achievement to others
6. Time-consuming
7. Need of a scoring plan to increase reliability
STEPS IN PORTFOLIO DEVELOPMENT
TYPES DESCRIPTION
Showcase A collection of students’ best work
Used for helping teachers, students, and family members think about various
Reflective
dimensions of student learning (e.g. effort, achievement, etc.)
A collection of items done for an extended period of time
Cumulative Analyzed to verify changes in the products and process associated with
student learning
A collection of works chosen by students and teachers to match pre-
Goal-based
established objectives
A way of documenting the steps and processes a student has done to
Process
complete a piece of work
RUBRICS
→ scoring guides, consisting of specific pre-established performance criteria, used in evaluating
student work on performance assessments
DEVELOPING RUBRICS
4
Rubric is a measuring instrument used in rating performance-based tasks; It is the “key to
corrections’ for assessment tasks designed to measure the attainment of leaning competencies
that require demonstration of skits or creation of products of learning. It offers a set of guidelines
or descriptions in scoring different levels of performance or qualities of products of learning; it
can be used in scoring both the process and the products of learning.
Below is a Venn Diagram that shows the graphical comparison of rubric, rating scale and checklist.
TYPES OFRUBRICS
5
But the Common Two Types:
1. Holistic Rubric – requires the teacher to score the overall process or product as a whole,
without judging the component parts separately
2. Analytic Rubric – requires the teacher to score individual components of the product or
performance first, then sums the individual scores to obtain a total score
ASSESSMENT METHODS
7
Affective Assessment Procedures/Tools
> Observational Techniques - used in assessing affective ami other non-cognitive learning
outcomes and aspects of development of students.
Anecdotal Records - method of recording factual description of students' behavior. :
8
III. Likert Scale-an assessment instrument which asks an individual to respond to a series pf
statements by indicating whether she/he strongly agrefc (SA), agrees (A), is undecided (U),
disagrees (D), or strongly disagrees (SO) witii each statement Each response is asso- dated
with a point value, and an individual's score is determined by . summing up the point values
for each positive statements: SA - 5, A - 4, U - 3, D - 2, SD -1. for negative statements, the
point values would be reversed, that is, SA -1, A - 2, and so on.
b. Projective Tests
• Projective tests were developed in an attempt to eliminate some of the.
major problems inherent in the use of self - report measures, such as toe tendency
erf some respondents to give socially acceptable responses
• The purposes of such tests are usually not obvious to respondents; the individual is
typically asked to respond to ambiguous items.
• The most commonly used projective technique is the method of association. This
technique asks the respondent to react to a stimulus such as a picture, inkblot, or
word.
Projective tests -- use ambiguous stimulus and asks the examinee to describe or
tell a story about it
9
Self-report tests -- also called objective tests or inventories; directly ask people
whether items describe them or not
Behavior rating - an inventory that asks an observer to rate the examinee in a
number of dimensions
Checklist - an assessment instrument that calls for a simple yes-no judgment It is
basically a method of recording whether a characteristic is present or absent or
whether an action was or was not taken.
i.e. checklist of student's daily activities
PROJECTIVE TECHNIQUES
• Projective drawings
- draw-a-person
- house-tree-person
- kinetic family drawing
• Rorschach Inkblot: what do you see?
• Thematic Apperception Test
• Sentence-completion
• Graphology
Example:
INKBLOT
AFFECTIVE ASSESSMENTS
1. Closed-Item or Forced-choice Instruments – ask for one or specific answer
a. Checklist – measures students’ preferences, hobbies, attitudes, feelings, beliefs, interests,
etc. by marking a set of possible responses
b. Scales – these instruments that indicate the extent or degree of one’s response
1) Rating Scale – measures the degree or extent of one’s attitudes, feelings, and perception
about ideas, objects and people by marking a point along 3- or 5- point scale
2) Semantic Differential Scale – measures the degree of one’s attitudes, feelings and
perceptions about ideas, objects and people by marking a point along 5- or 7- or 11- point
scale of semantic adjectives
3) Likert Scale – measures the degree of one’s agreement or disagreement on positive or
negative statements about objects and people
VALIDITY - the degree to which a test measures what is intended to be measured. It is the
usefulness of the test for a given purpose. It is the most important criteria of a good examination.
FACTORS influencing the validity of tests in general
Appropriateness of test – it should measure the abilities, skills and information it is
supposed to measure
Directions – it should indicate how the learners should answer and record their answers
Reading Vocabulary and Sentence Structure – it should be based on the intellectual level
of maturity and background experience of the learners
Difficulty of Items- it should have items that are not too difficult and not too easy to be able
to discriminate the bright from slow pupils
Construction of Items – it should not provide clues so it will not be a test on clues nor should
it be ambiguous so it will not be a test on interpretation
Length of Test – it should just be of sufficient length so it can measure what it is supposed to
measure and not that it is too short that it cannot adequately measure the performance we
want to measure
Arrangement of Items – it should have items that are arranged in ascending level of difficulty
such that it starts with the easy ones so that pupils will pursue on taking the test
Patterns of Answers – it should not allow the creation of patterns in answering the tes
RELIABILITY – it refers to the consistency of scores obtained by the same person when retested
using the same instrument or one that is parallel to it.
Type of Reliability
Method Procedure Statistical Measure
Measure
Give a test twice to the same group
Test-Retest Measure of stability with any time interval between sets Pearson r
from several minutes to several years
Measure of Give parallel forms of test at the
Equivalent Forms Pearson r
equivalence same time between forms
Test-Retest with Measure of stability Give parallel forms of test with
Pearson r
Equivalent Forms and equivalence increased time intervals between
12
forms
Give a test once. Score equivalent Pearson r and
Split Half halves of the test (e.g. odd-and even Spearman-Brown
numbered items) Formula
Give the test once, then correlate the
Kuder-Richardson
Kuder-Richardson Measure of Internal proportion/percentage of the students
Formula 20 and 21
Consistency passing and not passing a given item
Give a test once. Then estimate
Cronbach reliability by using the standard Kuder-Richardson
Coefficient Alpha deviation per item and the standard Formula 20
deviation of the test scores
ITEM ANALYSIS
STEPS:
1. Score the test. Arrange the scores from highest to lowest.
2. Get the top 27% (upper group) and below 27% (lower group) of the examinees.
3. Count the number of examinees in the upper group (PT) and lower group (PB) who got each
item correct.
4. Compute for the Difficulty Index of each item.
( PT + PB ) N = the total number of examinees
Df =
N
INTERPRETATION
13
SCORING ERRORS AND BIASES
Measuremen
Characteristics Examples
t
Rank data
Ordinal Income (1-low, 2-average, 3-high)
Distance between points are indefinite
Distance between points are equal Test scores
Interval
No absolute zero Temperature
Height
Ratio Absolute zero
Weight
14
DESCRIBING AND INTERPRETING TEST SCORES
15
MEASURES OF CORRELATION
Pearson r
∑ XY − ∑ X ∑ Y Where:
=
N ( )( ) N N X – scores in a test
2 2 Y – scores in a retest
r √ ∑ X2−
N N
X
(∑ ) √∑ (∑ ) N
Y2
−
N
Y N – number of examinees
16
17
INTERPRETATION OF THE Pearson r
Correlation value
19
- would normally hope to use a Pearson product-moment correlation on interval or ratio
data,Spearman correlation - can be used when the assumptions of the Pearson
correlation are markedly violated
- determines the strength and direction of the monotonic relationship between your two
variables rather than the strength and direction of the linear relationship between your
two variables, which is what Pearson's correlation determines .
Example:
Steps
Need to rank the scores for maths and English separately.
Score with the highest value should be labelled "1" and the lowest score should be labelled
"10" (if your data set has more than 10 cases then the lowest score will be how many
cases you have)
Look carefully at the two individuals that scored 61 in the English exam (highlighted in bold)
Notice their joint rank of 6.5 - because when you have two identical values in the data
(called a "tie")
Need to take the average of the ranks that they would have otherwise occupied
This example, we have no way of knowing which score should be put in rank 6 and which
score should be ranked 7
Notice that the ranks of 6 and 7 do not exist for English
These two ranks have been averaged ((6 + 7)/2 = 6.5) and assigned to each of these "tied"
scores
Two methods to calculate Spearman's correlation depending on whether: (1) your data
does not have tied ranks or (2) your data has tied ranks. The formula for when there are
no tied ranks is:
20
What values can the Spearman correlation coefficient, rs, take?
- Spearman correlation coefficient, rs, can take values from +1 to -1
- A rs of +1 indicates a perfect association of ranks, a rs of zero indicates no
association between ranks and a rs of -1 indicates a perfect negative
association of ranks
- The closer rs is to zero, the weaker the association between the ranks.
Example:
21
as n = 10. Hence, we have a ρ (or rs) of 0.67
indicates a strong positive relationship between the ranks individuals
obtained in the maths and English exam
the higher you ranked in maths, the higher you ranked in English also,
and vice versa.
How do you report a Spearman's correlation?
depends on whether or not you have determined the statistical
significance of the coefficient
simply run the Spearman correlation without any statistical significance
tests, you are able to simple state the value of the coefficient as shown
below:
22
How do you express the null hypothesis for this test?
H0: There is no [monotonic] association between the two variables [in the
population].
Remember, you are making an inference from your sample to the population that the
sample is supposed to represent. However, as this a general understanding of an
inferential statistical test, it is often not included. A null hypothesis statement for the
example used earlier in this guide would be:
Important to realize that statistical significance does not indicate the strength of
Spearman's correlation
Statistical significance testing of the Spearman correlation does not provide you with
any information about the strength of the relationship
Achieving a value of p = 0.001, for example, does not mean that the relationship is
stronger than if you achieved a value of p = 0.04
Because the significance test is investigating whether you can reject or fail to reject
the null hypothesis
Kuder-Richardson Formula 20
Where:
∑ pq K – number of items of a test
KR 20 =
K
K −1 [
1−
S2 ] p – proportion of the examinees
who got the item right
q – proportion of the examinees
who got the item wrong
S2 – variance or standard deviation
squared 23
Kuder-Richardson Formula 21
Where:
K K p̄ q X̄
KR 21=
K−1 [
1− 2
S ] p̄=
K
q=1-p
24
25
26
RELATIVE MEASURES OF VARIATION
Multiplying the coefficient by 100 is an optional step to get a percentage, as opposed to a decimal.
Example:A researcher is comparing two multiple-choice tests with different conditions. In the first
test, a typical multiple-choice test is administered. In the second test, alternative choices (i.e.
incorrect answers) are randomly assigned to test takers. The results from the two tests are:
27
Regular Test Randomized Answers
SD 10.2 12.7
SD 10.2 12.7
CV 17.03 28.35
Looking at the standard deviations of 10.2 and 12.7, you might think that the tests have
similar results.
However, when you adjust for the difference in the means, the results have more
significance:
Regular test: CV = 17.03
Randomized answers: CV = 28.35
Coefficient of variation
- can also be used to compare variability between different measures.
Example: You want to compare IQ scores to scores on the Woodcock-Johnson III Tests of
Cognitive Abilities.
Note: Coefficient of Variation should only be used to compare positive data on a ratio scale.
CV has little or no meaning for measurements on an interval scale.
Examples of interval scales: include temperatures in Celsius or Fahrenheit,
Examples of ratio scale: while the temperatures in Kelvin scale that starts at zero and
cannot, by definition, take on a negative value (0 degrees Kelvin is the absence of heat).
Steps Calculating by hand for a population or a sample.
Example: Two versions of a test are given to students. One test has pre-set answers and a second
test has randomized answers. Find the coefficient of variation.
SD 11.2 12.9
Step 1: Divide the standard deviation by the mean for the first sample:
( 11.2
50.1 )
= 0.22355
28
Step 2: Multiply Step 1 by 100:
0.22355 * 100 = 22.355%
Step 3: Divide the standard deviation by the mean for the second sample:
( 12.9
45.8 )
= 0.28166
STANDARD SCORES
Indicate the pupil’s relative position by showing how far his raw score is above or below average
29
Express the pupil’s performance in terms of standard unit from the mean
Represented by the normal probability curve or what is commonly called the normal curve
Used to have a common unit to compare raw scores from different tests
PERCENTILE
tells the percentage of examines that lies below one’s score
Example:
P85 = 70 (This means the person who scored 70 performed better than 85% of the
examinees)
85 %N −CFb
Formula:
P85=LL+i
( F P 85 )
Z-SCORES
tells the number of standard deviations equivalent to a given raw score
X− X̄ Where:
Z= X – individual’s raw score
Formula: SD
X̄ – mean of the normative group
SD – standard deviation of the
normative group
Example:
Formula: T −score=50+10 (Z )
Example:
Joseph’s T-score = 50 + 10(0.5) John’s T-score = 50 + 10(-0.5)
= 50 + 5 = 50 – 5
= 55 = 45
GRADING/REPORTIN
ADVANTAGES LIMITATIONS
G SYSTEM
can be recorded and
might not actually indicate
processed quickly
Percentage mastery of the subject
provides a quick overview of
(e.g. 70%, 86%) equivalent to the grade
student performance relative to
too much precision
other students
a convenient summary of provides only a general
Letter student performance indication of performance
(e.g. A, B, C, D, F) uses an optimal number of does not provide enough
categories information for promotion
encourages students to
reduces the utility of grades
Pass – Fail broaden their program of
has low reliability
studies
time-consuming to prepare and
more adequate in reporting
Checklist process
student achievement
can be misleading at times
might show inconsistency
can include whatever is
between reports
Written Descriptions relevant about the student’s
time-consuming to prepare and
performance
read
GRADES:
a. Could represent:
how a student is performing in relation to other students (norm-referenced grading)
the extent to which a student has mastered a particular body of knowledge (criterion-
referenced grading)
how a student is performing in relation to a teacher’s judgment of his or her potential
31
b. Could be for:
Certification that gives assurance that a student has mastered a specific content or
achieved a certain level of accomplishment
Selection that provides basis in identifying or grouping students for certain educational
paths or programs
Direction that provides information for diagnosis and planning
Motivation that emphasizes specific material or skills to be learned and helping
students to understand and improve their performance
Contract Grading System where each student agrees to work for a particular grade
according to agreed-upon standards.
1. Explain your grading system to the students early in the course and remind them of the
grading policies regularly.
2. Base grades on a predetermined and reasonable set of standards.
3. Base your grades on as much objective evidence as possible.
4. Base grades on the student’s attitude as well as achievement, especially at the elementary
and high school level.
5. Base grades on the student’s relative standing compared to classmates.
6. Base grades on a variety of sources.
7. As a rule, do not change grades, once computed.
8. Become familiar with the grading policy of your school and with your colleague’s standards.
32
9. When failing a student, closely follow school procedures.
10. Record grades on report cards and cumulative records.
11. Guard against bias in grading.
12. Keep pupils informed of their standing in the class.
Directions: Read and analyze each item carefully. Then, choose the best answer to each question.
2. Miss del Sol rated her students in terms of appropriate and effective use of some laboratory
equipment and measurement tools and if they are able to follow the specified procedures. What
mode of assessment should Miss del Sol use?
A. Portfolio Assessment C. Traditional Assessment
B. Journal Assessment D. Performance-Based Assessment
4. St. Andrews School gave a standardized achievement test instead of giving a teacher-made
test to the graduating elementary pupils. Which could have been the reason why this was the
kind of test given?
A. Standardized test has items of average level of difficulty while teacher-made test has
varying levels of difficulty.
B. Standardized test uses multiple-choice format while teacher-made test uses the essay
test format.
C. Standardized test is used for mastery while teacher-made test is used for survey.
33
D. Standardized test is valid while teacher-made tests is just reliable.
5. Which test format is best to use if the purpose of the test is to relate inventors and their
inventions?
A. Short-Answer C. Matching Type
B. True-False D. Multiple Choice
10. Which guideline in test construction is NOT observed in this test item?
EDGAR ALLAN POE WROTE ________________________.
14. Teacher Liza does norm-referenced interpretation of scores. Which of the following does she
do?
A. She uses a specified content as its frame of reference.
B. She describes group of performance in relation to a level of master set.
C. She compares every individual student score with others’ scores.
D. She describes what should be their performance.
15. All examinees obtained scores below the mean. A graphic representation of the score
distribution will be ________________.
A. negatively skewed C. leptokurtic
B. perfect normal curve D. positively skewed
20. A class is composed of academically poor students. The distribution will most likely to be
A. leptokurtic. C. skewed to the left
B. skewed to the right D. symmetrical
21. Of the following types of tests, which is the most subjective in scoring?
A. Enumeration C. Essay
B. Matching Type D. Multiple Choice
22. Tom’s raw score in the Filipino class is 23 which is equal to the 70th percentile. What does this
imply?
A. 70% of Tom’s classmates got a score lower than 23.
B. Tom’s score is higher than 23% of his classmates.
35
C. 70% of Tom’s classmates got a score above 23.
D. Tom’s score is higher than 23 of his classmates.
24. The score distribution follows a normal curve. What does this mean?
A. Most of the scores are on the -2SD
B. Most of the scores are on the +2SD
C. The scores coincide with the mean
D. Most of the scores pile up between -1SD and +1SD
25. In her conduct of item analysis, Teacher Cristy found out that a significantly greater number
from the upper group of the class got test item #5 correctly. This means that the test item
A. has a negative discriminating power C. is easy
B. is valid D. has a positive discriminating power
26. Mr. Reyes tasked his students to play volleyball. What learning target is he assessing?
A. Knowledge C. Products
B. Skill D. Reasoning
27. Martina obtained an NSAT percentile rank of 80. This indicates that
A. She surpassed in performance 80% of her fellow examinees
B. She got a score of 80
C. She surpassed in performance 20% of her fellow examinees
D. She answered 80 items correctly
28. Which term refers to the collection of student’s products and accomplishments for a period for
evaluation purposes?
A. Anecdotal Records C. Observation Report
B. Portfolio D. Diary
29. Which form of assessment is consistent with the saying “The proof of the pudding is in the
eating”?
A. Contrived B. Authentic C. Traditional D. Indirect
30. Which error do teachers commit when they tend to overrate the achievement of students
identified by aptitude tests as gifted because they expect achievement and giftedness to go
together?
A. Generosity error C. Severity Error
B. Central Tendency Error D. Logical Error
31. Under which assumption is portfolio assessment based?
A. Portfolio assessment is dynamic assessment.
B. Assessment should stress the reproduction of knowledge.
C. An individual learner is inadequately characterized by a test score.
D. An individual learner is adequately characterized by a test score.
32. Which is a valid assessment tool if I want to find out how well my students can speak
extemporaneously?
36
A. Writing speeches
B. Written quiz on how to deliver extemporaneous speech
C. Performance test in extemporaneous speaking
D. Display of speeches delivered
33. Teacher J discovered that her pupils are weak in comprehension. To further determine which
particular skill(s) her pupils are weak in, which test should Teacher J give?
A. Standardized Test C. Diagnostic
B. Placement D. Aptitude Test
34. “Group the following items according to phylum” is a thought test item on _______________.
A. inferring C. generalizing
B. classifying D. comparing
36. Which will be the most authentic assessment tool for an instructional objective on working with
and relating to people?
A. Writing articles on working and relating to people
B. Organizing a community project
C. Home visitation
D. Conducting a mock election
37. While she is in the process of teaching, Teacher J finds out if her students understand what she
is teaching. What is Teacher J engaged in?
A. Criterion-referenced evaluation C. Formative Evaluation
B. Summative Evaluation D. Norm-referenced Evaluation
38. With types of test in mind, which does NOT belong to the group?
A. Restricted response essay C. Multiple choice
B. Completion D. Short Answer
39. Which tests determine whether the students accept responsibility for their own behavior or pass
on responsibility for their own behavior to other people?
A. Thematic tests C. Stylistic tests
B. Sentence completion tests D. Locus-of-control tests
37
B. The blank is at the beginning of the sentence.
C. It is a very short question.
D. It is an insignificant test item.
42. “By observing unity, coherence, emphasis and variety, write a short paragraph on taking
examinations.” This is an item that tests the students’ skill to _________.
A. evaluate C. synthesize
B. comprehend D. recall
43. Teacher A constructed a matching type of test. In her columns of items are a combination of
events, people, circumstances. Which of the following guidelines in constructing matching type
of test did he violate?
A. List options in an alphabetical order C. Make list of items heterogeneous
B. Make list of items homogeneous D. Provide three or more options
44. Read and analyze the matching type of test given below:
Direction: Match Column A with Column B. Write only the letter of your answer on the blank of the left column.
Column A Column B
___ 1. Jose Rizal A. Considered the 8th wonder of the world
___ 2. Ferdinand Marcos B. The national hero of the Philippines
___ 3. Corazon Aquino C. National Heroes’ Day
___ 4. Manila D. The first woman President of the Philippines
___ 5. November 30 E. The capital of the Philippines
___ 6. Banaue Rice Terraces F. The President of the Philippines who served several terms
45. A number of test items in a test are said to be non-discriminating. What conclusion/s can be
drawn?
I. Teaching or learning was very good.
II. The item is so easy that anyone could get it right.
III. The item is so difficult that nobody could get it.
46. Measuring the work done by a gravitational force is a learning task. At what level of cognition is
it?
A. Comprehension C. Evaluation
B. Application D. Analysis
47. Which improvement/s should be done in this completion test item:
An example of a mammal is ________.
A. The blank should be longer to accommodate all possible answers.
B. The blank should be at the beginning of the sentence.
C. The question should have only one acceptable answer.
D. The item should give more clues.
38
48. Here is Teacher D’s lesson objective: “To trace the causes of Alzheimer’s disease.” Which is a
valid test for this particular objective?
A. Can an Alzheimer’s disease be traced to old age? Explain.
B. To what factors can Alzheimer’s disease be traced? Explain.
C. What is an Alzheimer’s disease?
D. Do young people also get attacked by Alzheimer’s disease? Support your answer?
49. What characteristic of a good test will pupils be assured of when a teacher constructs a table of
specifications for test construction purposes?
A. Reliability C. Construct Validity
B. Content Validity D. Scorability
51. In taking a test, one examinee approached the proctor for clarification on what to do. This
implies a problem on which characteristic of a good test?
A. Objectivity C. Scorability
B. Administrability D. Economy
52. Teacher Jane wants to determine if her students’ scores in the second grading is reliable.
However, she has only one set of test and her students are already on their semestral break.
What test of reliability can she use?
A. Test-retest C. Equivalent Forms
B. Split-half D. Test-retest with equivalent forms
53. Mrs. Cruz has only one form of test and she administered her test only once. What test of
reliability can she do?
A. Test of stability C. Test of correlation
B. Test of equivalence D. Test of internal consistency
39
54. What is the lower limit of the class with the highest frequency?
A. 39.5 B. 40 C. 44 D. 44.5
56. About what percent of the cases falls between +1 and -1 SD in a normal curve?
A. 43.1% B. 95.4% C. 99.8% D. 68.3%
57. Study this group of test which was administered to a class to whom Peter belongs, then answer
the question:
SUBJECT MEAN SD PETER’S SCORE
Math 56 10 43
Physics 41 9 31
English 80 16 109
In which subject(s) did Peter perform most poorly in relation to the group’s mean
performance?
A.
B. English D. English and Physics
C. Physics E. Math
58. Based on the data given in #57, in which subject(s) were the scores most widespread?
A. Math C. Cannot be determined
B. Physics D. English
59. A mathematics test was given to all Grade V pupils to determine the contestants for the Math
Quiz Bee. Which statistical measure should be used to identify the top 15?
A. Mean Percentage Score C. Percentile Rank
B. Quartile Deviation D. Percentage Score
60. A test item has a difficulty index of .89 and a discrimination index of -.44. What should the
teacher do?
A. Make it a bonus item. C. Retain the item.
B. Reject the item. D. Make it a bonus and reject it.
61. What is/are important to state when explaining percentile-ranked tests to parents?
I. What group took the test
II. That the scores show how students performed in relation to other students.
III. That the scores show how students performed in relation to an absolute measure.
62. Which of the following reasons for measuring student achievement is NOT valid?
A. To prepare feedback on the effectiveness of the learning process
B. To certify the students have attained a level of competence in a subject area
C. To discourage students from cheating during test and getting high scores
D. To motivate students to learn and master the materials they think will be covered by the
achievement test.
40
63. The computed r for English and Math score is -.75. What does this mean?
A. The higher the scores in English, the higher the scores in Math.
B. The scores in Math and English do not have any relationship.
C. The higher the scores in Math, the lower the scores in English.
D. The lower the scores in English, the lower the scores in Math.
41
4. A test item has a difficulty index of .81 and discrimination index of .13. What should the test
constructor do?
A. Retain the item. C. Revise the item.
B. Make it a bonus item. D. Reject the item.
7. Teacher Ria discovered that her pupils are very good in dramatizing. Which tool must have
helped her discover her pupil’s strength?
A. Portfolio Assessment C. Journal Entry
B. Performance Assessment D. Pen-and-paper Test
8. Which among the following objectives in the psychomotor domain is highest in level?
A. To contract a muscle C. To distinguish distant and close
B. To run a 100-meter dash sounds
D. To dance the basic steps of the waltz
9. If your LET items sample adequately the competencies listed in education courses syllabi, it can
be said that LET possesses _________ validity.
A. Concurrent B. Construct C. Content D. Predictive
10. In the context on the theory on multiple intelligences, what is one weakness of the pen-and-
paper test?
A. It is not easy to administer.
B. It puts the non-linguistically intelligent at a disadvantage.
C. It utilizes so much time.
D. It lacks reliability.
42
C. The students must be academically poor.
D. The scores congregate on the right side of the normal distribution.
14. The criterion of success in Teacher Lyn’s objective is that “the pupils must be able to spell 90%
of the words correctly”. Ana and 19 others correctly spelled 40 words only out of 50. This means
that Teacher Lyn:
A. attained her objective because of her effective spelling drill
B. attained her lesson objective
C. failed to attain her lesson objective as far as the twenty pupils are concerned
D. did not attain her lesson objective because of the pupil’s lack of attention
16. When a significantly greater number from the lower group gets a test item correctly, this implies
that the test item
A. is very valid C. is not highly reliable
B. is not very valid D. is highly reliable
19. If the scores of your test follow a negatively skewed distribution, what should you do?
Find out_________________.
A. Why your items were easy C. Why most of the scores are low
B. Why most of the scores are high D. Why some pupils scored high
21. Referring to assessment of learning, which statement on the normal curve is FALSE?
A. The normal curve may not necessarily apply to homogeneous class.
B. When all pupils achieve as expected their learning, curve may deviate from the normal
curve.
C. The normal curve is sacred. Teachers must adhere to it no matter what.
D. The normal curve may not be achieved when every pupil acquires targeted
competencies.
43
22. Aura Vivian is one-half standard deviation above the mean of his group in arithmetic and one
standard deviation above in spelling. What does this imply?
A. She excels both is arithmetic and spelling.
B. She is better in arithmetic than in spelling.
C. She does not excel in spelling nor in arithmetic.
D. She is better in spelling than in arithmetic.
23. You give a 100-point test, three students make scores of 95, 91 and 91, respectively, while the
other 22 students in the class make scores ranging from 33 to 67. The measure of central
tendency which is apt to best describe for this group of 25 is
A. the mean C. an average of the median & mode
B. the mode D. the median
24. NSAT and NEAT results are interpreted against a set of mastery level. This means that NSAT
and NEAT fall under
A. criterion-referenced test C. aptitude test
B. achievement test D. norm-referenced test
25. Which of the following is the MOST important purpose for using achievement test? To measure
the_______.
A. Quality & quantity of previous learning C. Educational & vocational aptitude
B. Quality & quantity of previous teaching D. Capacity for future learning
26. What should be AVOIDED in arranging the items of the final form of the test?
A. Space the items so they can be read easily
B. Follow a definite response pattern for the correct answers to insure ease of scoring
C. Arrange the sections such that they progress from the very simple to very complex
D. Keep all the items and options together on the same page.
29. Below is a list of method used to establish the reliability of the instrument. Which method is
questioned for its reliability due to practice and familiarity?
A. Split-half C. Test-retest
B. Equivalent Forms D. Kuder Richardson Formula 20
C. Arm
D. Wrist
45
A. Analogy C. Short Answer Type
B. Rearrangement Type D. Problem Type
34. Teacher B wants to diagnose in which vowel sound(s) her students have difficulty. Which tool is
most appropriate?
A. Portfolio Assessment C. Performance Test
B. Journal Entry D. Paper-and-pencil Test
35. The index of difficulty of a particular test is .10. What does this mean? My students
____________.
A. gained mastery over the item.
B. performed very well against expectation.
C. found that the test item was either easy nor difficult.
D. find the test item difficult.
36. Study this group of test which was administered with the following results, then answer the
question that follows.
Subject Mean SD Ronnel’s Score
Math 56 10 43
Physics 41 9 31
English 80 16 109
In which subject(s) did Ronnel perform best in relation to the group’s performance?
A. Physics and Math C. Math
B. English D. Physics
37. Which applies when the distribution is concentrated on the left side of the curve?
A. Bell curve C. Leptokurtic
B. Positively skewed D. Negatively Skewed
39. Danny takes an IQ test thrice and each time earns a similar score. The test is said to possess
____________.
A. objectivity B. reliability C. validity D. scorability
46
40. The test item has a discrimination index of -.38 and a difficulty index of 1.0. What does this
imply to test construction? Teacher must__________.
A. recast the item C. reject the item
B. shelve the item for future use D. retain the item
41. Here is a sample TRUE-FALSE test item: All women have a longer life-span than men.
What is wrong with the test item?
A. The test item is quoted verbatim from a textbook.
B. The test item contains trivial detail.
C. A specific determiner was used in the statement.
D. The test item is vague.
42. In which competency do my students find greatest difficulty? In the item with the difficulty index
of
A. 1.0 B. 0.50 C. 0.90 D. 0.10
43. “Describe the reasoning errors in the following paragraph” is a sample though question on
_____________.
A. synthesizing B. applying C. analyzing D. summarizing
44. In a one hundred-item test, what does Ryan’s raw score of 70 mean?
A. He surpassed 70 of his classmate in terms of score.
B. He surpassed 30 of his classmates in terms of score.
C. He got a score above the mean.
D. He got 70 items correct.
45. Study the table on item analysis for non-attractiveness and non-plausibility of distracters based
on the results of a multiple choice tryout test in math. The letter marked with an asterisk in the
correct answer.
A* B C D
Upper 27% 10 4 1 1
Lowe 27% 6 6 2 0
47. Which measure(s) of central tendency is (are) most appropriate when the score distribution is
badly skewed?
A. Mode C. Median
B. Mean and mode D. Mean
47
48. Is it wise to practice to orient our students and parents on our grading system?
A. No, this will court a lot of complaints later.
B. Yes, but orientation must be only for our immediate customers, the students.
C. Yes, so that from the very start, students and their parents know how grades are
derived.
D. No, grades and how they are derived are highly confidential.
49. With the current emphasis on self-assessment and performance assessment, which is
indispensable?
A. Numerical grading C. Transmutation Table
B. Paper-and-Pencil Test D. Scoring Rubric
50. “In the light of the facts presented, what is most likely to happen when …?” is a sample thought
question on ____________.
A. inferring B. generalizing C. synthesizing D. justifying
51. With grading practice in mind, what is meant by teacher’s severity error?
A teacher ___________.
A. tends to look down on student’s answers
B. uses tests and quizzes as punitive measures
C. tends to give extremely low grades
D. gives unannounced quizzes
52. Ms. Ramos gave a test to find out how the students feel toward their subject Science. Her first
item was stated as “Science is an interesting _ _ _ _ _ boring subject”. What kind of instrument
was given?
A. Rubric C. Rating Scale
B. Likert-Scale D. Semantic Differential Scale
55. When points in scatter gram are spread evenly in all directions this means that:
A. The correlation between two variables is positive.
B. The correlation between two variables is low.
C. The correlation between two variables is high.
D. There is no correlation between two variables.
60. The following are trends in marking and reporting system, EXCEPT:
A. indicating strong points as well as those needing improvement
B. conducting parent-teacher conferences as often as needed
C. raising the passing grade from 75 to 80
D. supplementing subject grade with checklist on traits
As part of the course requirement , at the end of the semester I will be requiring this hand-
outs be returned to me fully signified. Then, I will return back it to you to be used as LET
Reviewer
This signature attest that I have read this lecture notes or have gone over/thru this handouts
49
50