Farhady, H. (1986) Theories of Language Testing

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

Theories in language testing Hossein Farhady

Fundamental Concepts in Language Testing:


Theories*
Hossein Farhady
Iran University of Science and Technology

Introduction

In the previous sections, some fundamental concepts in language testing including


function, form, and characteristics of language tests were discussed. In this section,
another fundamental concept, namely, the theories of language testing will be briefly
explained, and various test types conforming to differing theories will be presented.

As mentioned before, there has always been a close relationship between teaching and
testing theories. Modifications in the former have changed the nature of the latter
because testing methods tend to follow teaching methods. Of course, a clear-cut
distinction between methods of teaching and testing does not exist since each method
has placed certain priorities on the relative importance of a language component. Nor
does a chronological ordering of methods, either for testing or teaching, seem to be
applicable because there have been long periods of overlap and/or competition
between different methods at different times. Thus, the typology of theories and
methods of testing presented here should be considered only tentative. The main
purpose of classifying the theories, however, is to clarify the degree of emphasis given
to certain skills by various theories.

Following different teaching theories, various testing theories such as translation,


discrete-point, integrative, and functional have been developed. Each single theory
requires a separate paper to be dealt with. However, because of time and space
limitations, it seems that a brief explanation of each theory will prove fruitful.

Translation Method

Before the theoretical developments in linguistics and related fields, a well-


established theory for language testing did not exist. Testing language ability was
accomplished through subjective measures such as translation tasks and essay-type
question tests. Such testing procedures, which stemmed from the old grammar-
translation method of teaching, paid little or no attention to the statistical
characteristics of the tests. In this method, students were given a passage or a set of
sentences to translate from the source language into the target language or from the
target language into the source language. In other cases, essay-type questions were
given to the students and their responses were scored based on subjective evaluations
of one or two raters. The accuracy and fairness of such evaluations were at best
questionable. These inadequacies forced language testing specialists to devise
objective measures and develop new approaches.

Discrete-Point Approach

28
Theories in language testing Hossein Farhady

Developments in linguistics led to the emergence of structural linguistics and


advancements in psychology resulted in behavioristic psychology. The Audio-Lingual
approach to language teaching evolved out of the interactions of the principles of
structural linguistics and behavioristic psychology. Under the influence of the audio-
lingual approach to language teaching, testing procedures were fundamentally
modified. The theory of language testing developed following the audiolingual
approach assumed that language was a system of habits concerning matters of form,
meaning, and distribution at several levels of structure, namely, those of sentence,
clause, phrase, word, morpheme, and phoneme (Lado, 1961). Accordingly, measuring
the linguistic properties of these habits at various levels was assumed to manifest the
language ability of the learners. This kind of testing, which became popular all over
the world, was referred to as the discrete-point approach.

The basic tenet of the discrete-point approach rested upon the fact that every single
language element, i.e., grammar, vocabulary, pronunciation, etc., should be tested
separately. A discrete-point item measures one and only one element of language at a
time. The proponents of this approach viewed language as a system composed of an
infinite number of items. They believed that testing a representative sample of these
hypothetical items would provide an accurate estimate of the examinees’ language
ability. Along with the influences from linguistics, which led to the development of
discrete-point items, the influence of psychology on language testing resulted in the
application of psychometric principles to language tests. The contribution of
linguistics and psychology to language testing helped test developers construct precise
and objective language tests with sound statistical properties.

Discrete-point tests, usually in the multiple-choice format, are still one of the
competing types of language tests. The following sample items demonstrate the
applicability of discrete-point tests to measuring various language abilities.

1. Sample spelling item:


The student hears BOOK.
There is a BOOK on the table.
The student writes BOOK.

2. Sample vocabulary item:


Integrity means:
a. intelligence c. intrigue
b. uprightness d. weakness

3. Sample structure item:


Zahra …… in Tehran since 1350.
a. is living c. has lived
b. lives d. lived

Discrete-point tests have proven to be highly reliable and reasonably valid measures
of language elements that they are intended to assess. However, because of new
advancements and modifications in language teaching methods, language testing
procedures were to be modified as well.
Integrative Approach

29
Theories in language testing Hossein Farhady

New trends in linguistics and psychology questioned the foundations of behavioristic


psychology and structural linguistics. Chomsky (1965) presented an innovative
approach to the description of language referred to as the generative-transformational
theory. Psychologists moved towards a new school, namely, cognitive psychology.
The principles of generative-transformational linguistics along with those of cognitive
psychology gave birth to a new approach in language teaching called the cognitive-
code learning theory. Following the cognitive code learning theory, a new approach to
language testing called the integrative theory, was formed.

Advocates of cognitive-code learning and integrative testing believe that language is a


holistic phenomenon; thus, it should not be broken into discrete items. They contend
that knowledge of the discrete items does not necessarily guarantee the ability to use
language in real-life situations. In other words, they claim that the sum of parts is not
necessarily equal to the whole. That is, the sum of the knowledge of structure,
vocabulary, and other language elements does not necessarily mean that the learner
will be able to use language as an integrative tool for communication.

Well-known integrative tests include oral interviews, reading comprehension tests,


compositions, listening comprehension tests, dictation-type tests, and cloze
procedures. Among these tests, oral interviews are quite time-consuming and costly;
reading and listening comprehension tests are well established through multiple-
choice items; compositions are less reliable because they are scored subjectively;
cloze and dictation type tests, however, are fairly new and quite unknown in most
educational circles. Therefore, a detailed explanation of these two tests seems
warranted.

Cloze Tests

Cloze tests have probably been the most popular kind of tests in the last two decades.
Although the idea originated in the early fifties, the cloze tests were not utilized as
testing instruments until the late sixties and early seventies. Ever since the
employment of cloze procedures as measurement devices of language ability, an
enormous amount of research has been conducted on almost all aspects of these
procedures.

The word “cloze” seems to be a spelling corruption of the word “close” as in “Close
the door”. The term, coined for the first time by Taylor in 1953, is used to remind the
reader of the process of “closure” in Gestalt psychology. In the cloze procedure, the
closures are created by deleting certain words from a passage. The examinee, then, is
required to fill in the blanks with appropriate words based on contextual clues
provided in the passage.

It should be mentioned that the cloze procedure was originally developed to determine
the readability level of the texts written for native speakers. Later, it served as a
device for assessing the reading comprehension ability of native speakers of English.
Finally, it was utilized as an integrative measure to evaluate non-native speakers’
command of the language they attempt to learn. Consider the following example:
Hossein is a freshman and he (1)…….. having all the problems that most
(2) ……have. As a matter of fact, his (3) ……started before he left home.
(4) ……had to do a lot of (5) …… that he did not like to do.

30
Theories in language testing Hossein Farhady

In this passage, there are five missing words. The testee is supposed to read the
passage and guess the missing words. According to the theory of expectancy
grammar, the more proficient the reader is, the better he will decide on the missing
words. In other words, if the reader has a high command of the language, he will
easily reconstruct the passage and fill in the blanks with the appropriate words
intended. In the above example, the missing words are “is,” “freshmen,” “problems,”
“He,” and “things,” respectively. This example would facilitate explaining the
technical characteristics of the cloze tests to be presented below.

The first step is to define the cloze test. Although various definitions have been
suggested, most scholars agree that the cloze test is any passage of appropriate length
and reasonable difficulty with every “nth” word deleted. The definition, of course,
seems ambiguous. What constitutes appropriate length? What is meant by reasonable
difficulty? What is the purpose and the number of “n”? In the following sections,
these questions will be answered to clarify the issue.

The second step is to determine what the appropriate level of difficulty is. Great care
must be exercised in selecting a passage for developing a cloze test. If the passage is
beyond the linguistic ability of the test takers, they will not understand it, and thus
will not be able to determine the missing words. An easy passage, on the other hand,
will result in perfectly correct responses for the missing words, and thus will not
provide any useful information about the differences among the examinees’
proficiency levels in English. Therefore, the passage should have an appropriate level
of difficulty.

To determine the appropriateness of the passage difficulty, certain readability scales


are developed. One frequently used readability scale is constructed by Fry (1977).
According to this scale, passages are rated to be suitable from grade 1 up to grade 12
for native speakers. Research findings indicate that non-native speakers’ command of
language at the pre-university stage ranges from 4 to 6, at the university
undergraduate stage ranges from 7 to 9, and at the graduate level ranges from 10 to 12
grades. Thus, to find a passage of appropriate difficulty, one has to apply this scale
and find out whether the passage is suitable for the group of examinees or not. It
should be mentioned that the proposed readability scale is only one of the many
available scales. Although it is frequently used for research purposes, it should not be
considered as “the scale” but just one of the available ones.

The third step is to determine what the “n” is. This letter simply refers to the number
of words preceding a deletion. Of course, the greater the number of words between
the two deletions, the easier the guessing of the missing words because more
contextual clues are available for the examinee. Therefore, to determine the
appropriate number of words between the two deletions, researchers developed cloze
tests with every 3rd, 5th, 7th, 9th, and 11th word deleted. The results of experiments
revealed that the passage with every 7th word deleted is most reliable and valid.
Consequently, the “n” was set to be “7” as one of the principles of the later called the
“standard cloze test”.
Another line of research investigated the influence of leaving the first and the last
sentences of the passage intact on the characteristics of experimental cloze passages.
Therefore, the researchers developed cloze passages in which they did not delete any

31
Theories in language testing Hossein Farhady

words from the first and the last sentences in the passage. In comparison to the cloze
tests, which started the deletion from the first sentence and continued the deletion to
the last sentence of the passage, the experimental cloze tests showed higher reliability
and validity than other forms of the cloze tests. Thus, another criterion for the
standard cloze test was set to leave the first and the last sentences of the passage
intact.

The fourth step is to determine the number of deletions in a cloze test. Again,
researchers developed cloze passages with 100, 90, 80, 70, 60,50, 40, 30, and 20
deletions. The results indicated that a cloze passage having 25 to 30 blanks was the
most efficient one in terms of reliability and validity. It was demonstrated that over 30
blanks; the gain in reliability and validity were not statistically significant enough to
increase the number of deletions. Consequently, the third criterion for the standard
cloze test was set and the number of deletions should be between 25 and 30.

When the number of deletions was determined, the reasonable length of the passage
would be easy to decide on. Assuming that there are 30 blanks in the cloze test with
every 7th word deleted, the passage will be about 210 words long. Allowing 20 to 40
words for the first and the last sentences, which should be left intact, the reasonable
length of the passage for a cloze test would be around 250 words. Thus, a standard
cloze is a passage of appropriate difficulty in which every 7th word is deleted and the
first and the last sentences are left intact.

The development of the test necessitates scoring procedures after the test is
administered. In the cloze test, too, a simple and objective scoring technique had to be
developed. Therefore, to facilitate the scoring procedure, scholars considered each
blank as an item and developed various scoring methods, among which the “exact
word method” and the “acceptable word method” are very common and frequently
used.

In the “exact word method”, an item is given a point if and only if the originally
deleted word is provided by the examinee. Although this method makes the
examinee’s task quite difficult, it is often employed in non–native environments. In
the acceptable word method, on the other hand, a supplied word will be considered
correct, and thus given credit, if it is acceptable in the context of the passage. That is,
if the supplied word makes the context meaningful, it will be considered as the correct
response.

Although research results indicate that there is no significant difference between the
two methods of scoring, the “acceptable word method” has proven to be more suitable
for the examinees. The major difficulty in this method, however, concerns the
identification of acceptable words for a given blank. The most practical way to
determine the acceptable words is to pretest the cloze test with a sufficient number of
native or native-like speakers. However, native speakers may not be readily available
in a non–native speaking environment. On the other hand, the “exact word method”,
though difficult for the test takers, does not need the pre-testing of the test with native
speakers. Therefore, in EFL situations, “the exact word method” is recommended.

The cloze procedures explained here are referred to as the “standard cloze in open-
ended” form. A different version of the standard cloze is in the multiple-choice

32
Theories in language testing Hossein Farhady

format. In this type, four choices are provided for each blank, and the examinee is
required to choose the most appropriate word from among the given alternatives. Like
other recognition tests, the multiple-choice cloze assesses the examinees’ passive
knowledge of the language, whereas the open-ended cloze measures examinees'
productive linguistic abilities. It should be pointed out that the multiple-choice cloze
tests are easier to take than the open-ended ones because the nature of production
tasks requires a higher level of competency than recognition activities.

Using either form, language testers are recommended to employ standard forms of the
cloze as testing instruments. Other varieties of cloze procedures are considered useful
activities for instructional purposes. They should not, however, be used as testing
devices. Some varieties of cloze procedure, referred to as “alternative cloze”, for
classroom activities follow:

1. A sentence or a set of sentences each with a word deleted.


2. A short paragraph with the words of a certain grammatical class such as articles,
prepositions, verb forms, etc., deleted.
3. A passage with certain deletions to be filled in with words from a list given to the
student.

In addition to these sample varieties, any cloze procedure that does not follow the
principles of standard cloze would be considered an alternative cloze. Alternative
cloze tests, as mentioned before, do not constitute reliable or valid measurement
devices. Therefore, they should be used in informal situations but not as testing
instruments.

It should also be kept in mind that passages from different scientific areas such as
humanities, engineering, medicine, etc., can be easily developed as cloze tests for the
students in their respective majors. In addition, the number of deletions, the number of
words between the two deletions, the kinds of words to be deleted, and the difficulty
level of the passage would give great maneuverability to the teachers and educators in
employing the cloze procedures. Therefore, cloze tests can serve as a versatile device
for both instructional and evaluative purposes.

Dictation

The other integrative test, dictation, is one of the old instruments for measuring
language ability. Unfortunately, however, it did not receive any serious attention until
the late sixties because of two major reasons. First, early scholars claimed that
dictation was not an economical test; in addition, the scoring procedure for dictation
was not objective. Second, some testers misused the dictation because they did not
pay attention to the concept and purpose of dictation. They used dictation as a spelling
test that was completely against the principles of integrative testing in general, and of
dictation in particular.

During the last two decades, fortunately, testing specialists observed the utility of
dictation tests, and thus, dictation obtained its deserving position among other tests.
Dictation became one of the most highly respected integrative measures of language
ability. Research on dictation has also demonstrated high validity and reasonable
reliability for such tests.

33
Theories in language testing Hossein Farhady

As with cloze tests, certain criteria have been set for the so-called “standard
dictation”. A standard dictation is a passage of appropriate length (usually 100-150
words) with reasonable difficulty (determined by readability scales) read three times
in the following manner.

In the first reading, the passage is read, preferably on tape, at the normal rate of
speech. In this stage, the examinees only listen to get the general idea of the passage.
They are not allowed to write anything down at this step.

In the second reading, the passage is read at the normal rate of speech, and with
sufficient pauses at appropriate points with punctuation marks supplied. During the
pauses, the examinees are required to write down the chunks of the language they
hear. The length of time for each pause should be determined in advance and
following the number of words within the chunk to be written down. The following
example demonstrates the places where pauses should be exercised.

It is often observed / that university students / have more problems / than those in high school.

It should be clear that the pause should be given at the point at which the natural
reading process requires it.

In the third reading, the passage is read as it were in the first reading. The purpose of
this last reading is to give the examinees a chance to correct the words or to write
down the words they might have missed in the previous readings.

After administering the dictation, it should be objectively scored. In scoring dictation


tests, the following points should be taken into account:

1. Every word is considered as an item.


2. Every morphologically correct word is given a point.
3. Spelling does not count as long as the meaning of the word is preserved. That is, if
a word such as “ship” is written as “sheep,” it would be considered wrong because
the meaning of the word is changed. However, if a word such as “beautiful” is
written as “beautiful,” it would be considered correct and thus given a point
because the spelling error does not change the meaning of the word.

It should be emphasized that ignoring some of the spelling errors in dictation does not
imply, by any means, that spelling is not important in language teaching or language
testing. Spelling requires a long time to be mastered through tedious work on the parts
of both the teacher and the student. The main point, however, is that using dictation
for spelling purposes is unacceptable because it would serve neither the purpose of
dictation nor that of spelling. Therefore, these two tests, dictation, and spelling, should
be kept quite separate from one another and used appropriately.

Cloze and dictation-type tests have been used widely as integrative measures of
language proficiency. It is quite possible, however, that some teachers may not be
familiar with such tests. However, it is recommended that teachers familiarize
themselves with these new tests and utilize such techniques in the classroom. This
would serve two purposes. First, students will become familiar with these new types

34
Theories in language testing Hossein Farhady

of language tasks; and second, students will benefit from the instructional values of
these activities. These procedures can be used as effective exercises to teach language
in its natural form without decomposing it into discrete items. Of course, these
developments do not end improvements in language testing because language-
teaching theories continue to change. Consequently, testing procedures are modified.
The latest modifications in testing and teaching theories are presented below.

Notional Functional Approach

Modifications in language teaching led to the development of a new approach referred


to as the notional-functional approach (NFA). Although the basic linguistic and
psychological principles of the cognitive-code learning theory are maintained, they
differ in two major respects in the NFA. First, in the NFA, great attention is paid to
the social appropriateness of sentences and utterances. That is, the grammatical
correctness of a sentence is necessary but not sufficient for that sentence to be used in
a communicative setting. Accordingly, an utterance must be both linguistically
accurate and socially appropriate.

Second, the NFA considers language as communicative chunks called “functions”.


Functions refer to what people do with language in real communication settings. For
example, people use language to seek information, apologize, persuade others, and so
forth. These functions are carried out utilizing linguistic elements that are called
“notions”. Thus, the NFA assumes that language consists of certain functions to be
fulfilled through certain linguistic structures, i.e., notions.

This new approach to language teaching necessitated a new approach to language


testing. The method developed to follow the principles of the NFA is referred to as
functional testing of which the main objective is to assess learners’ ability in carrying
out language functions.

To develop a functional test item, a multiple-choice form, for example, a real


language context, based on a certain function, should be constructed as the stem.
Then, the alternatives should be developed through certain pre-testing steps. To
explain the characteristics of a functional test item and the steps followed in
developing it, an example will be helpful. Consider the following item:

You are applying to a university and need a letter of recommendation. You go to a professor, who is
also your friend, and say:
a. I’d appreciate it if you could write a letter of recommendation for me.
b. I want you to write a letter of recommendation for me.
c. I wonder if you can write a letter of recommendation for me.
d. Hey, give me a recommendation letter.

This functional item has the following unique characteristics that no other test item
possesses. These characteristics are:

1. The function to be fulfilled in this item is “getting things done,” and in this
a particular case, “requesting someone to do something.” The stem
demonstrates the function that someone, i.e., the student, wants someone else,
i.e., the professor, to do something, i.e., write a letter of recommendation.

35
Theories in language testing Hossein Farhady

2. The social setting or the communicative context in which the function is to be


fulfilled is an academic environment because it is to take place in a university.

3. The social relationship between the people involved in communication is friendly


as stated in the stem.

4. The social status of these participants is unequal because one of them is a student
and the other is a professor.

5. The first alternative, which is linguistically accurate and socially appropriate, is


based on the performance of native speakers of English, i.e., the most frequent
statement produced by native speakers in this particular situation.

6. The second alternative, which is linguistically accurate but socially inappropriate, is


based on the performance of non-native speakers who have not lived in the
English-speaking communities, but have received a lot of formal instruction.

7. The third alternative, which is linguistically incorrect but socially appropriate, is


based on the performance of nonnative speakers who have lived in the English
speaking communities for a long time, but have not mastered the linguistic rules of
the language.

8. The last alternative, which is the only distractor, is neither linguistically accurate
nor socially inappropriate.

9. These characteristics enable functional tests to measure the communicative abilities


as
as well as the linguistic abilities of the examinees. Moreover, these tests are more
suitable
for measuring special functions in different fields of ESP.

Summary and Conclusion

In this section, theories of language testing were discussed. It was mentioned that
testing theories follow teaching methodologies. The following table illustrates this
correspondence.

Teaching Method Testing Method


Grammar Translation Translation; Essay Type Items
Audio-Lingual Approach Discrete-point Items
Cognitive-Code Learning Integrative Test Items
Notional-Functional Approach Functional Test Items

The close relationship between teaching and testing methods dictates some
pedagogical implications. That is, a particular teaching method, utilized in an
educational setting, requires teaching materials to be developed based on the
principles of that method. More importantly, the testing approach should be in
harmony with the teaching method employed.

36
Theories in language testing Hossein Farhady

Consider an educational setting in which instructional materials are prepared based on


the cognitive code theory, taught based on the grammar-translation method, and tested
through discrete-point items. Such an unfortunate situation would most probably lead
the educational program to fail because there is no relationship between the method of
teaching, the procedures for materials development, and the testing method used.

To achieve instructional objectives, there should be a close relationship between these


objectives and the instructional materials. Furthermore, the materials should be taught
through methods corresponding to the principles upon which the materials are
prepared. Finally, the achievement of the students should be measured through
utilizing testing methods that correspond to the principles of the teaching method
employed. Otherwise, the instructional program is not likely to succeed in achieving
the pre planned objectives.

* This is the revised version of the paper printed in Roshd Foreign Language Teaching Journal (1986).
2 (4). Tehran, Iran.

37

You might also like