Farhady, H. (1986) Theories of Language Testing
Farhady, H. (1986) Theories of Language Testing
Farhady, H. (1986) Theories of Language Testing
Introduction
As mentioned before, there has always been a close relationship between teaching and
testing theories. Modifications in the former have changed the nature of the latter
because testing methods tend to follow teaching methods. Of course, a clear-cut
distinction between methods of teaching and testing does not exist since each method
has placed certain priorities on the relative importance of a language component. Nor
does a chronological ordering of methods, either for testing or teaching, seem to be
applicable because there have been long periods of overlap and/or competition
between different methods at different times. Thus, the typology of theories and
methods of testing presented here should be considered only tentative. The main
purpose of classifying the theories, however, is to clarify the degree of emphasis given
to certain skills by various theories.
Translation Method
Discrete-Point Approach
28
Theories in language testing Hossein Farhady
The basic tenet of the discrete-point approach rested upon the fact that every single
language element, i.e., grammar, vocabulary, pronunciation, etc., should be tested
separately. A discrete-point item measures one and only one element of language at a
time. The proponents of this approach viewed language as a system composed of an
infinite number of items. They believed that testing a representative sample of these
hypothetical items would provide an accurate estimate of the examinees’ language
ability. Along with the influences from linguistics, which led to the development of
discrete-point items, the influence of psychology on language testing resulted in the
application of psychometric principles to language tests. The contribution of
linguistics and psychology to language testing helped test developers construct precise
and objective language tests with sound statistical properties.
Discrete-point tests, usually in the multiple-choice format, are still one of the
competing types of language tests. The following sample items demonstrate the
applicability of discrete-point tests to measuring various language abilities.
Discrete-point tests have proven to be highly reliable and reasonably valid measures
of language elements that they are intended to assess. However, because of new
advancements and modifications in language teaching methods, language testing
procedures were to be modified as well.
Integrative Approach
29
Theories in language testing Hossein Farhady
Cloze Tests
Cloze tests have probably been the most popular kind of tests in the last two decades.
Although the idea originated in the early fifties, the cloze tests were not utilized as
testing instruments until the late sixties and early seventies. Ever since the
employment of cloze procedures as measurement devices of language ability, an
enormous amount of research has been conducted on almost all aspects of these
procedures.
The word “cloze” seems to be a spelling corruption of the word “close” as in “Close
the door”. The term, coined for the first time by Taylor in 1953, is used to remind the
reader of the process of “closure” in Gestalt psychology. In the cloze procedure, the
closures are created by deleting certain words from a passage. The examinee, then, is
required to fill in the blanks with appropriate words based on contextual clues
provided in the passage.
It should be mentioned that the cloze procedure was originally developed to determine
the readability level of the texts written for native speakers. Later, it served as a
device for assessing the reading comprehension ability of native speakers of English.
Finally, it was utilized as an integrative measure to evaluate non-native speakers’
command of the language they attempt to learn. Consider the following example:
Hossein is a freshman and he (1)…….. having all the problems that most
(2) ……have. As a matter of fact, his (3) ……started before he left home.
(4) ……had to do a lot of (5) …… that he did not like to do.
30
Theories in language testing Hossein Farhady
In this passage, there are five missing words. The testee is supposed to read the
passage and guess the missing words. According to the theory of expectancy
grammar, the more proficient the reader is, the better he will decide on the missing
words. In other words, if the reader has a high command of the language, he will
easily reconstruct the passage and fill in the blanks with the appropriate words
intended. In the above example, the missing words are “is,” “freshmen,” “problems,”
“He,” and “things,” respectively. This example would facilitate explaining the
technical characteristics of the cloze tests to be presented below.
The first step is to define the cloze test. Although various definitions have been
suggested, most scholars agree that the cloze test is any passage of appropriate length
and reasonable difficulty with every “nth” word deleted. The definition, of course,
seems ambiguous. What constitutes appropriate length? What is meant by reasonable
difficulty? What is the purpose and the number of “n”? In the following sections,
these questions will be answered to clarify the issue.
The second step is to determine what the appropriate level of difficulty is. Great care
must be exercised in selecting a passage for developing a cloze test. If the passage is
beyond the linguistic ability of the test takers, they will not understand it, and thus
will not be able to determine the missing words. An easy passage, on the other hand,
will result in perfectly correct responses for the missing words, and thus will not
provide any useful information about the differences among the examinees’
proficiency levels in English. Therefore, the passage should have an appropriate level
of difficulty.
The third step is to determine what the “n” is. This letter simply refers to the number
of words preceding a deletion. Of course, the greater the number of words between
the two deletions, the easier the guessing of the missing words because more
contextual clues are available for the examinee. Therefore, to determine the
appropriate number of words between the two deletions, researchers developed cloze
tests with every 3rd, 5th, 7th, 9th, and 11th word deleted. The results of experiments
revealed that the passage with every 7th word deleted is most reliable and valid.
Consequently, the “n” was set to be “7” as one of the principles of the later called the
“standard cloze test”.
Another line of research investigated the influence of leaving the first and the last
sentences of the passage intact on the characteristics of experimental cloze passages.
Therefore, the researchers developed cloze passages in which they did not delete any
31
Theories in language testing Hossein Farhady
words from the first and the last sentences in the passage. In comparison to the cloze
tests, which started the deletion from the first sentence and continued the deletion to
the last sentence of the passage, the experimental cloze tests showed higher reliability
and validity than other forms of the cloze tests. Thus, another criterion for the
standard cloze test was set to leave the first and the last sentences of the passage
intact.
The fourth step is to determine the number of deletions in a cloze test. Again,
researchers developed cloze passages with 100, 90, 80, 70, 60,50, 40, 30, and 20
deletions. The results indicated that a cloze passage having 25 to 30 blanks was the
most efficient one in terms of reliability and validity. It was demonstrated that over 30
blanks; the gain in reliability and validity were not statistically significant enough to
increase the number of deletions. Consequently, the third criterion for the standard
cloze test was set and the number of deletions should be between 25 and 30.
When the number of deletions was determined, the reasonable length of the passage
would be easy to decide on. Assuming that there are 30 blanks in the cloze test with
every 7th word deleted, the passage will be about 210 words long. Allowing 20 to 40
words for the first and the last sentences, which should be left intact, the reasonable
length of the passage for a cloze test would be around 250 words. Thus, a standard
cloze is a passage of appropriate difficulty in which every 7th word is deleted and the
first and the last sentences are left intact.
The development of the test necessitates scoring procedures after the test is
administered. In the cloze test, too, a simple and objective scoring technique had to be
developed. Therefore, to facilitate the scoring procedure, scholars considered each
blank as an item and developed various scoring methods, among which the “exact
word method” and the “acceptable word method” are very common and frequently
used.
In the “exact word method”, an item is given a point if and only if the originally
deleted word is provided by the examinee. Although this method makes the
examinee’s task quite difficult, it is often employed in non–native environments. In
the acceptable word method, on the other hand, a supplied word will be considered
correct, and thus given credit, if it is acceptable in the context of the passage. That is,
if the supplied word makes the context meaningful, it will be considered as the correct
response.
Although research results indicate that there is no significant difference between the
two methods of scoring, the “acceptable word method” has proven to be more suitable
for the examinees. The major difficulty in this method, however, concerns the
identification of acceptable words for a given blank. The most practical way to
determine the acceptable words is to pretest the cloze test with a sufficient number of
native or native-like speakers. However, native speakers may not be readily available
in a non–native speaking environment. On the other hand, the “exact word method”,
though difficult for the test takers, does not need the pre-testing of the test with native
speakers. Therefore, in EFL situations, “the exact word method” is recommended.
The cloze procedures explained here are referred to as the “standard cloze in open-
ended” form. A different version of the standard cloze is in the multiple-choice
32
Theories in language testing Hossein Farhady
format. In this type, four choices are provided for each blank, and the examinee is
required to choose the most appropriate word from among the given alternatives. Like
other recognition tests, the multiple-choice cloze assesses the examinees’ passive
knowledge of the language, whereas the open-ended cloze measures examinees'
productive linguistic abilities. It should be pointed out that the multiple-choice cloze
tests are easier to take than the open-ended ones because the nature of production
tasks requires a higher level of competency than recognition activities.
Using either form, language testers are recommended to employ standard forms of the
cloze as testing instruments. Other varieties of cloze procedures are considered useful
activities for instructional purposes. They should not, however, be used as testing
devices. Some varieties of cloze procedure, referred to as “alternative cloze”, for
classroom activities follow:
In addition to these sample varieties, any cloze procedure that does not follow the
principles of standard cloze would be considered an alternative cloze. Alternative
cloze tests, as mentioned before, do not constitute reliable or valid measurement
devices. Therefore, they should be used in informal situations but not as testing
instruments.
It should also be kept in mind that passages from different scientific areas such as
humanities, engineering, medicine, etc., can be easily developed as cloze tests for the
students in their respective majors. In addition, the number of deletions, the number of
words between the two deletions, the kinds of words to be deleted, and the difficulty
level of the passage would give great maneuverability to the teachers and educators in
employing the cloze procedures. Therefore, cloze tests can serve as a versatile device
for both instructional and evaluative purposes.
Dictation
The other integrative test, dictation, is one of the old instruments for measuring
language ability. Unfortunately, however, it did not receive any serious attention until
the late sixties because of two major reasons. First, early scholars claimed that
dictation was not an economical test; in addition, the scoring procedure for dictation
was not objective. Second, some testers misused the dictation because they did not
pay attention to the concept and purpose of dictation. They used dictation as a spelling
test that was completely against the principles of integrative testing in general, and of
dictation in particular.
During the last two decades, fortunately, testing specialists observed the utility of
dictation tests, and thus, dictation obtained its deserving position among other tests.
Dictation became one of the most highly respected integrative measures of language
ability. Research on dictation has also demonstrated high validity and reasonable
reliability for such tests.
33
Theories in language testing Hossein Farhady
As with cloze tests, certain criteria have been set for the so-called “standard
dictation”. A standard dictation is a passage of appropriate length (usually 100-150
words) with reasonable difficulty (determined by readability scales) read three times
in the following manner.
In the first reading, the passage is read, preferably on tape, at the normal rate of
speech. In this stage, the examinees only listen to get the general idea of the passage.
They are not allowed to write anything down at this step.
In the second reading, the passage is read at the normal rate of speech, and with
sufficient pauses at appropriate points with punctuation marks supplied. During the
pauses, the examinees are required to write down the chunks of the language they
hear. The length of time for each pause should be determined in advance and
following the number of words within the chunk to be written down. The following
example demonstrates the places where pauses should be exercised.
It is often observed / that university students / have more problems / than those in high school.
It should be clear that the pause should be given at the point at which the natural
reading process requires it.
In the third reading, the passage is read as it were in the first reading. The purpose of
this last reading is to give the examinees a chance to correct the words or to write
down the words they might have missed in the previous readings.
It should be emphasized that ignoring some of the spelling errors in dictation does not
imply, by any means, that spelling is not important in language teaching or language
testing. Spelling requires a long time to be mastered through tedious work on the parts
of both the teacher and the student. The main point, however, is that using dictation
for spelling purposes is unacceptable because it would serve neither the purpose of
dictation nor that of spelling. Therefore, these two tests, dictation, and spelling, should
be kept quite separate from one another and used appropriately.
Cloze and dictation-type tests have been used widely as integrative measures of
language proficiency. It is quite possible, however, that some teachers may not be
familiar with such tests. However, it is recommended that teachers familiarize
themselves with these new tests and utilize such techniques in the classroom. This
would serve two purposes. First, students will become familiar with these new types
34
Theories in language testing Hossein Farhady
of language tasks; and second, students will benefit from the instructional values of
these activities. These procedures can be used as effective exercises to teach language
in its natural form without decomposing it into discrete items. Of course, these
developments do not end improvements in language testing because language-
teaching theories continue to change. Consequently, testing procedures are modified.
The latest modifications in testing and teaching theories are presented below.
You are applying to a university and need a letter of recommendation. You go to a professor, who is
also your friend, and say:
a. I’d appreciate it if you could write a letter of recommendation for me.
b. I want you to write a letter of recommendation for me.
c. I wonder if you can write a letter of recommendation for me.
d. Hey, give me a recommendation letter.
This functional item has the following unique characteristics that no other test item
possesses. These characteristics are:
1. The function to be fulfilled in this item is “getting things done,” and in this
a particular case, “requesting someone to do something.” The stem
demonstrates the function that someone, i.e., the student, wants someone else,
i.e., the professor, to do something, i.e., write a letter of recommendation.
35
Theories in language testing Hossein Farhady
4. The social status of these participants is unequal because one of them is a student
and the other is a professor.
8. The last alternative, which is the only distractor, is neither linguistically accurate
nor socially inappropriate.
In this section, theories of language testing were discussed. It was mentioned that
testing theories follow teaching methodologies. The following table illustrates this
correspondence.
The close relationship between teaching and testing methods dictates some
pedagogical implications. That is, a particular teaching method, utilized in an
educational setting, requires teaching materials to be developed based on the
principles of that method. More importantly, the testing approach should be in
harmony with the teaching method employed.
36
Theories in language testing Hossein Farhady
* This is the revised version of the paper printed in Roshd Foreign Language Teaching Journal (1986).
2 (4). Tehran, Iran.
37