0% found this document useful (0 votes)
28 views18 pages

PRINCIPLES - OF - LANGUAGE - ASSESSMENT Chapter Two

Download as docx, pdf, or txt
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 18

1

Presentation on theme: "Language Assessment: Principles &


Classroom Practices"— Presentation transcript:
1 Language Assessment: Principles & Classroom Practices
CHAPTER 1 TESTING, ASSESSING, & TEACHING
H. D. BROWN

2 What is a test? Test, Evaluation, Assessment?


-Test: method of measuring a person’s ability, knowledge, or
performance in a given domain
-Indirect nature of testing: judge competence based on limited
information on one’s performance & problem of generalization &
interpretation

3 Assessment and Teaching


Assessment: an ongoing process that encompasses a much wider
domain and it is done to make important curricular & instructional
decisions or judge learners’ abilities in language domains
- Assessment should be based on solid theories & research!
- Test is a subset of assessment and its score is different from the
interpretation of the result.

4 Assessment and Teaching


Evaluation (평가): a systematic way of performance in a given
context and a bigger concept dealing with tests, assessment, and
judgments based on national and institutional policies
Relationship of tests, assessment, and teaching (Figure 1.1) : teaching
as a ground for ample opportunities to “play” with language for
meaningful learning

5 Informal & Formal Assessment


Informal assessment in classroom
Formal assessment: planned, systematic, time-constrained, based on
a limited sample of behaviors; Something is at stake! (e.g., stake-high
exams)
All tests are formal assessment but not all assessment are formal.

6 Formative & Summative Assessment


2

Formative Assessment (형성평가): look into “forming” one’s


competences to help learners to develop more
Summative Assessment(종합평가): measure or summarize one’s
performance at the end of a course or instruction
When all tests are considered “summative”, attrition of learning
occurs!

7 Norm-referenced & Criterion-references tests


Norm-referenced tests: compare results with others using mean,
median, standard deviation, percentile rank, etc.
-Most standardized tests are norm-
referenced tests.
Criterion-referenced tests: All those who reach certain criteria can
pass!; more for
feedback on specific course or lesson
objectives

8 Approaches to Language Testing: A Brief Histoy


1950s: test discrete elements of language
1970s-1980s: integrative view of testing
Discrete-point vs. Integrative Testing: Issue of (de)contextualization
& authenticity
Cloze test & dictation as integrative testing
Unitary trait hypothesis (Oller, 1983): “indivisible” nature of
language

9 Communicative Language Testing


Canale & Swain (1980): grammatical, discourse, sociocultural,
strategic competences
Bachman (1990): Organizational & Pragmatic competences
Bachman & Palmer (1996): Strategic competence

10 Performance-based Assessment
Types of PBA: oral interview, written essays, open-ended Q & A,
integrated skills performance, group performance, interactive tasks
Importance of language use and authenticity but one may question the
reliability of scoring!

11 Current Issues in Classroom Testing


3

New Views on Intelligence


-Alfred Binet vs. H. Gardner’s multiple intelligence(MI): linguistic,
logical-mathematical, spatial, musical, bodily-kinesthetic,
interpersonal, intrapersonal intelligences
-Goleman’s EQ(emotional quotient)
-tyranny of objectivity in high-stake testing

12 Current Issues in Classroom Testing


Traditional vs Alternative Assessment
-one-shot, standardized vs. continuous & long-term
-timed, multiple choice vs. untimed, free-response
-decontextualized vs. contextualized
-scores only vs. individualized feedback
-norm-referenced vs. criterion-referenced

13 Current Issues in Classroom Testing


Traditional vs Alternative Assessment
-summative vs. formative
-product-oriented vs. process-oriented
-non-interactive vs. interactive performance
-extrinsic vs. intrinsic motivation
Beware of a bias toward alternative assessment!

14 Current Issues in Classroom Testing


Computer-based Testing(CBT)
-Computer-adaptive test(CAT)
-Internet-based test(IBT)
-based on Item Response Theory(IRT):
-See advantages & disadvantages (p. 15)

15 Basic Principles Suggested


Periodic assessment
Appropriate assessment
Pinpoint strengths and weaknesses
Periodic closure to modules within a curriculum
Promote student autonomy by encouraging self-evaluation of their
progress
Help learners set their own goals
Aid in evaluating teaching effectiveness
4

Language assessment
Chapter two: principles of language assessment

Abstract:
This paper tells about the main principles of language
assessment that should be followed by the teachers as
the constructors of assessment. There are five major
principles of language assessment; practically,
reliability, validity, authenticity, and wash back.
5

Teaching is not only about delivering knowledge to the students


but also about constructing students’ understanding. A teacher has to
know about what lesson he is going to deliver to the students, the way
how to deliver it to them, and how to give assessment. Assessment is
one of the important things in teaching and learning process because it
is a tool to measure whether the students know or understand the
material or not. By giving assessment, teacher can get information
about students’ achievement.
Brown (2010: 25) stated that there are five major principles of
language assessment; practically, reliability, validity, authenticity, and
wash back. They are going to be described in more detail as followed:

1. Practically
Brown said that practically refers to the logistical, down to
earth, administrative issue involved in making, giving, and scoring
and assessment instrument (2010: 26). Further, Mousavi in Brown
(2010: 26) stated that these include cost, the amount of time it
takes to construct and to administer, ease of scoring, and ease of
reporting the result.
Based on the definition above, it can be conclude that
practically defines in term of cost, time, administration,
scoring/evaluation.
a. Cost
A good test should not be too expensive to conduct. A teacher
should avoid conducting a test that requires excessive budget.
b. Time
6

A good test should not be too long or too short to be finished


by the students.
c. Administration
A good test should not be too complicated or difficult to
conduct and it should be simple to administer.
d. Scoring/evaluation
A good test should be followed by something to make it easy to
score like rubrics of scoring and key answer.

2. Reliability
Brown (2010: 27) said that a reliable test is consistent and
dependable. If you give the same test to the same student or
matched students on two different occasions, the test should yield
similar results. From the definition above, it means that if the test
is conducted to the same students on different occasions then it
will produce almost the same result. For example, a student will
get the same score if he or she takes the test, possibly with a
different examiner, on a Monday morning or a Tuesday afternoon.
There is a relationship between reliability and validity as
stated by Bachman (2011: 160). He said that in order for a test
score to be valid, it must be reliable. There are some issues related
to reliability as stated by Brown (2010: 28-29). They are:
a. Student-related Reliability
According to Brown (2010: 28), the most common learner-
related issue in reliability is caused by temporary illness,
fatigue, a "bad day" anxiety, and other physical or
7

psychological factors, which may make an observed score


deviate from one's true score. For example, when a student is
not in his good mood because of his “bad day” while taking a
test, then it can affect his score.
b. Rater Reliability
Rater reliability deals with the scoring process. It can be caused
by human error and subjectively.

c. Test Administration Reliability


Test administration reliability concerns with the situation and
condition in which the test is administered. For example, when
a teacher wants to conduct a listening test, he should prepare a
room that is comfortable for listening activity. He has to make
sure that the activity will run well by considering all the things
related to the test like the audio system (it should be clear to all
the students), the lighting, and seating arrangement as well.
d. Test Reliability
Brown said that sometimes the nature of the test can cause
measurement errors (2010: 29). It means that test reliability
refers to the test itself. For example, when the teacher conducts
the test with multiple choice items and in one question has
more than one correct answer or when the teacher uses
ambiguous sentence in the test then it can affect the score.

3. Validity
8

As stated by Brown (2010: 30), a valid test measures


exactly what it proposes to measure. For example, when the
students are given a reading test about the human respiration, a
valid test will measure the reading ability such as identifying
general or specific information of the text, not their prior
knowledge (biology) about the human respiration.
Brown (2010: 30-35) proposed five ways to establish
validity. They are content validity, criterion validity, construct
validity, consequential validity, and face validity.
a. Content Validity
Content validity refers to the correlation between the
content of the test and the language skill, structure, etc.
For example:
When the teacher wants to assess students’ speaking
ability in a conversational setting, then the teacher asks
the students to answer paper-and-pencil multiple-choice
questions requiring grammatical judgments. It is not
achieve content validity. The teacher should conduct a
test that requires the students actually to speak to their
friend.
b. Criterion Validity
Criterion validity emphasizes on the relationship between
the test score and the outcome. According to Brown
(2010: 32), criterion validity usually falls into concurrent
validity and predictive validity.
 Concurrent Validity
9

A test has concurrent validity if its results are


supported by other concurrent performance beyond
the assessment itself.
For example:
The validity of a high score on the final exam of a
foreign language course will be verified by the actual
proficiency in the language.
 Predictive Validity
Predictive validity is to measure and predict a test
taker’s likelihood of future success.
For example:
TOEFL test is intended to know how well someone
will perform the capability of his English in the
future.
c. Construct Validity
Construct validity refers to concepts or theories which are
underlying the usage of certain ability.
For example:
Proficiency, communicative competence, and fluency are
examples of linguistic construct. When the teacher
conducts a speaking test, the scoring analysis for the test
includes several factors in the final score: pronunciation,
fluency, grammatical accuracy, vocabulary use, and socio
linguistics appropriateness. The justification of these five
factors lies in the theoretical construct that claims those
factors to be major components of oral proficiency. But if
10

he conducts a test that evaluated only pronunciation and


grammar, he could be justifiably suspicious about the
construct validity of that test.
d. Consequential Validity
Consequential validity refers to the consequences of
using a particular test for a particular purpose. A good
test must give positive consequence for the students. So,
the teacher should consider the effect of assessment on
students’ motivation, independent learning, study habits,
and attitude toward school work.
e. Face Validity
According to Gronlund in Brown (2010: 35), face validity
is students view the assessment as fair, relevant, and
useful for improving learning. Moreover, Mousavi in
Brown (2010: 35) stated that face validity refers to the
degree to which a test looks right, and appears to measure
the knowledge or abilities it claims to measure, based on
the subjective judgment of the examinees who take it, the
administrative personnel who decide on its use, and other
psychometrically unsophisticated observers.
Students may feel that a test isn’t testing what it’s
supposed to test, and this might affect their performance
and consequently affect the result of the test. To
overcome students’ perception, the teacher as test
constructor has to consider:
11

 Students will be more confident if they face a well


constructed, expected format with familiar task.
 Students will be less anxious if the test can be
accomplished within an allotted time limit.
 Students will be optimistic if the items of the test are
clear and uncomplicated.
 Students will find it is easy to do the test if the
directions are clear.
 Students will be less worried if the test is related to
their course work.

4. Authenticity
The fourth major principle of language assessment is
authenticity. It deals with the “real word”. Teachers should
conduct a test with the test items are likely to be applied in the real
context of daily life. Brown (2010: 37) proposes consideration that
might be helpful to present authenticity in a test. They are:
 The language in the test is as natural as possible.
 Topics are meaningful (relevant, interesting) for the learner.
 Some thematic organization to items is provided, such as
through a story line or episode.
 Tasks represent, or closely approximate, real-world tasks.

5. Washback
According to http://teflpedia.com/Washback_effect,
washback refers to the influence, either positive or negative, that
12

an exam has on the way in which students are taught. In addition,


Hughes in Brown (2010: 37) said that washback is the effect of
testing on teaching and learning. Based on those definitions above,
it can be concluded that washback refers to the effect of testing on
teaching and learning process and it has two sides; positive and
negative.
 Positive Washback
Positive washback has beneficial influence on teaching and
learning for both the teachers and the students.
For example:
When the teacher conducts daily paper based test and asks the
students to answer some questions, after they finish their job
they will submit the paper to the teacher then the teacher check
their job. After that, the teacher not only gives a score but also
gives a feedback or comment about their strengths and their
weaknesses on test performance in order to give motivation to
the students.
 Negative Washback
A test which has negative washback is considered to have
negative influence on teaching and learning.
For example:
When the teacher conducts daily paper based test and asks the
students to answer some questions, after they finish their job
they will submit the paper to the teacher then the teacher check
their job and only gives them a score without any comments.
13

In reality, letter grade or numerical score is not enough. The


students need a feedback from their teacher.

CONCLUSION
Based on the explanation about the principles of language
assessment above, we can conclude that a test is good if it contains
practically, high reliability, good validity, authenticity, and positive
washback. The teachers should apply these five principles in
conducting a test on their teaching and learning process.

We're used to GPA's (Grade Point Averages) being calculated from


these letter grades:

A=4
14

B=3
C=2
D=1
But some institutions use a 5 point system, while others even use a 9-
point system!

Alternatives to letter grading include:

Self-assessment
Narrative evaluations
Checklist evaluations
Conferences
And some institutions use narrative evaluations rather than letters or
numbers:

These narratives can supplement grades, or replace them


Expected to be read by admissions personnel in the institution the
student is seeking admissions to
Letter grades are sometimes assigned
Conferences

Washback offsets impracticality


Take students aside one-by-one while others complete work
Schedule a few office hours
Define objectives clearly & concisely
Cross-Cultural Factors
15

It's often unusual for a student to self-evaluate in many cultures (the


teacher assigns the grade and isn't questioned)
If anybody scored 100% on a test, the teacher would be considered
sub-par
A's (or the equivalent) are often reserved for a select few exceptional
students, and B is considered an excellent grade
A single final exam is often used to determine the student's final grade
The idea of a teacher preparing students to do their best on a test is
unusual
Bearing these factors in mind, it's important to create your own
philosophies of grading and evaluation consistent with your
philosophy of teaching and evaluation, but if you take a teaching
position abroad...

More guidelines for grading & evaluation:

Select appropriate criteria for grading


Define the relative weighting in calulating grades
Communicate the criteria to your students at the start of the course
Triangulate letter grade evaluations with more formative alternatives
that provide more washback
Thank you!
Grading & Student Evaluation:
Institutional Expectations and Constraints
----
16

Ch. 11 Language Assessment Principles and Classroom Practices, H.


Douglas Brown
Point systems are more prevalent globally than the letter grades we
use here in the U.S.

Self-assessment of end-of-course attainment of objectives is


recommended by using:

checklists
guide journal entries directing student to reflect on content and
linguistic objectives
an essay that self-assesses
a teacher-student conference
Pros of Narrative Evaluations:

Individualization
Multiple objective evaluation
Face validity
Washback potential
Cons of Narrative Evaluation:

Hard for evaluators to quantify


Time consuming for teachers
Students ignore them (especially if a letter grade is included)
Teachers have template-ized them
17

Checklist evaluations can be used as a compromise to narrative


evaluations

checklist with brief comments from the teacher


followed by a conference or response from the student
Pros of checklists:

increased practicality
increased reliability
maintains washback
Minimizes teacher time
Uniformity of measures
Teacher open-ended comments
Student can respond with goals
Easier for student to process
Be careful about overturning centuries of tradition!
Be tactful and sensitive or you might be on the next flight home!
Two questions to answer:
1. What are some alternatives to letter grading?
2. Which of the alternatives do you believe is best and why?
References:
Bachman, L.F. 2011. Fundamental Consideration in Language
Testing. New York: Oxford University Press.

Brown, H.D., Abeywickrama Priyanvada. 2010. Language


Assessment: Principles and Classroom Practices (2nd Edition).
New York: Pearson Education, Inc.
18

http://teflpedia.com/Washback_effect accessed on March 9th, 2018 at


08.11 am.

You might also like