DISCO: Development and Integration of Speech Technology Into Courseware For Language Learning

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

DISCO: Development and Integration of Speech technology into Courseware

for language learning

Catia Cucchiarini, Joost van Doremalen, and Helmer Strik

Department of Linguistics, Radboud University Nijmegen, The Netherlands


[C.Cucchiarini | J.vanDoremalen | H.Strik]@let.ru.nl

errors addressed in the training system was significantly larger


Abstract than in the control group [2].
Recent research has shown that a properly designed ASR- These results are promising and show that it is possible to
based CALL system (Dutch-CAPT) was capable of detecting use speech technology in CALL applications to improve
pronunciation errors and of providing comprehensible speaking proficiency. In the Netherlands speaking proficiency
feedback on pronunciation. Since pronunciation is not the plays an important role within the framework of civic
only skill required for speaking a second language, we integration examinations. Foreigners who wish to acquire
explored the possibility of extending the Dutch-CAPT Dutch citizenship have to show that they are able to get by in
approach to other aspects of speaking proficiency like Dutch society and that they speak the Dutch language at the
morphology and syntax. In this paper we explain how a Common European Framework (CEF) A2 level, which means
number of errors in morphology and syntax that are common that they can make themselves understood in Dutch and that
in spoken Dutch L2 could be addressed in an ASR-based others understand what they say. For instance, they must be
CALL system. Finally, we present our new project in which able to pay for their purchases in the supermarket or buy a
corrective feedback will be provided on all three aspects of train ticket.
spoken proficiency: pronunciation, morphology and syntax. However, pronunciation is only one of the skills required
Index Terms: pronunciation training, CALL, ASR, error for speaking a second language. There are also other aspects
detection. of spoken language that are important and that have to be
mastered in order to be comprehensible and proficient in a
second language. For instance, morphology and syntax also
1. Introduction play an important role in language comprehension and
One-on-one interactive learning with corrective feedback language learning. It is known that learners tend to make
(CF) is known to be optimal for language learners. The two different morphologic and syntactic mistakes when they speak
sigma benefit demonstrated by Bloom [1] has provided than when they write. It is generally acknowledged in the
further support for the advantages of one-on-one tutoring second language (L2) literature that the fact that L2 learners
relative to classroom instruction. However, one-on-one are aware of certain grammatical rules (i.e. those concerning
tutoring by trained language instructors is costly and therefore subject-verb concord of number, tenses for strong and weak
not feasible for the majority of language learners. In the verbs, and plural formation) does not automatically entail that
classroom, providing individual CF is not always possible, they also manage to marshal this knowledge on line while
mainly due to lack of time. This particularly applies to oral speaking. In other words, in order to learn to speak properly
proficiency, where CF has to be provided immediately after in a second language, L2 learners need to practice speaking
the utterance has been spoken, thus making it even more and need to receive CF on their performance on line, not only
difficult to provide sufficient practice in the classroom. on pronunciation, but also on morphology and syntax.
The emergence of Computer Assisted Language Learning A CALL system that is able to detect errors in speaking
(CALL) systems that make use of Automatic Speech performance, point them out to the learners and give them the
Recognition (ASR) seems to offer new perspectives for opportunity to try again until they manage to produce the
training oral proficiency. These systems can offer extra correct form would be very useful because in L2 classes there
learning time and material, specific feedback on individual is not enough time for this type of practice and feedback. We
errors and the possibility to simulate realistic interaction in a therefore decided to explore the possibility of extending the
private and stress-free environment. However, existing CALL approach adopted in Dutch-CAPT to other aspects of
systems hardly begin to fulfill these requirements. We believe speaking proficiency like morphology and syntax, and the
that this is due to the lack of proper detection of performance results are presented in this paper. It turned out that there are
problems, coupled to feedback that is embedded in a realistic a number of errors in morphology and syntax that are
communicative setting and helps the learners to effectively common in spoken Dutch L2 and that could be addressed in
improve their performance. an ASR-based CALL system. In this paper we first describe
Recent research has shown that a properly designed ASR- these errors, then we explain how these problematic aspects
based CALL system is capable of detecting pronunciation could be addressed in a CALL system and finally we present
errors and of providing comprehensible CF on pronunciation our new project in which CF will be provided on all three
[2]. This system, called Dutch-CAPT, was designed to aspects of spoken proficiency: pronunciation, morphology
provide CF on a selected number of speech sounds that had and syntax.
appeared to be problematic for learners of Dutch from various
first language (L1) backgrounds [3]. The results showed that
for the experimental group that had been using the CALL
system for four weeks the reduction in the pronunciation

Copyright © 2008 ISCA


Accepted after peer review of full paper 2791 September 22- 26, Brisbane Australia
2. Morphological and syntactic errors in
Another syntactic phenomenon known to be problematic for
spoken Dutch L2 learners of Dutch L2 is Verb Second following an adverbial
adjunct. Dutch is a verb-second language that requires subject
2.1. Morphological errors in spoken Dutch L2 inversion following an adverbial in initial position, as in (8b),
but many learners construct an SVO clause, as in (8a).
Problems with morphology are persistent in L2 learning [4]
and phonetic-phonological properties play a prominent role in (8)
this learning process. As stated in [4]: ``The meaning of a. * dan hij gaat tv kijken
morphemes and the distribution of their allomorphs cannot be then he goes tv watch
acquired without the phonological capacity to extricate them b. dan gaat hij tv kijken
from the flood of sounds in every sentence". To develop this then goes he tv watch
capacity learners first have to notice the contrast between
their own erroneous realization (output) and the target form 3. Extending Dutch-CAPT to morphology
(input), as explained in Schmidt’s Noticing Hypothesis [5].
Difficulties in learning Dutch verbal morphology are related and syntax
to perception and production of L2 phonemes such as schwa It is well-known that recognition of non-native speech is
and /t/. As to perception, it is crucial to perceive the problematic. In the Dutch-CAPT system recognition of the
differences in (1) in order to understand the Dutch agreement utterances was successful because we severely restricted the
paradigm, and in (2) in order to understand the tense system exercises and thus the possible answers of the learners.
(present vs. past tense). Confidence measures were then used to determine which of
the utterances was spoken. In order to extend ASR-based
(1) /maak/, /maakt/, /make(n)/ feedback to morphology and syntax it is necessary to design
exercises that are appropriate for practicing these aspects of
(2) /maakt/, /maakte/ spoken proficiency on the one hand, but that are controlled
enough to be handled by ASR. For pronunciation it is
On the production side, difficulties in pronouncing certain possible to use imitation and reading exercises and these can
sound combinations may lead a Moroccan learner to say (3) be handled by ASR because the vocabulary is known in
when trying to pronounce /loopt/. advance. For morphology and syntax such exercises cannot be
used because learners then have no freedom to show whether
(3) /lopet/, /loopte/ they are able to produce correct forms. So, the exercises that
are required have to be such that they allow some freedom to
2.2. Syntactic errors in spoken Dutch L2 the learners in formulating answers, but that are predictable
In syntax problems have been observed with word order, enough to be handled by ASR. To this end we went on to
finite verb position, and pronominal subject omission. Owing explore whether it would be possible to design exercises that
to L1 transfer, Turkish learners are known to produce comply with these requirements. We found that suitable
sentence-final verbs as in (4) instead of the correct (5). exercises can be designed by stimulating students to produce
utterances containing the required morphological and
(4) * Jong mandarijn sneeuwman neus maakte. syntactic forms by showing them words on the screen,
Boy tangerine snowman nose made. without declensions, or pictograms, possibly in combination
(intended form is: ‘maakt’) with figures representing scenes (e.g. a girl reading a book).
In addition, as in Dutch-CAPT, use can be made of dialogues
(5) De jongen maakt met een mandarijn de neus and scenarios illustrating so-called “crucial practice
The boy makes with a tangerine the nose situations” (in Dutch cruciale praktijksituaties or CPS), which
van de sneeuwman. correspond to realistic situations in which learners might find
of the snowman. themselves in Dutch society and in which they have to
interact with other citizens. These CPSs form the basis of the
A second difficult but basic syntactic phenomenon to acquire various civic integration examinations. The students can be
is the obligatory presence of the subject in Dutch. Pronominal asked to play a certain dialogue by using simple prompts
subject omission (or subject pro-drop) is allowed in the L1 of concerning the vocabulary to be used and they have to
many learners of Dutch and is frequently produced in early L2 formulate the correct sentences themselves.
developmental stages, as in (6a) and (6b). The subject in In these exercises realistic communicative situations can
sentence-final position (6c) is another manifestation of the be presented and the learners have the opportunity of
same pro-drop phenomenon. The correct form is given in (7). performing realistic tasks. They receive prompts as to the
words they have to use, so that vocabulary can be anticipated
(6) for ASR, but the learners have to produce the grammatically
a. * loop(t) naar huis (typically Moroccan) correct forms themselves, so that morphology or syntax can
walk(s) home be tested and practiced. For morphology: a picture is shown
b. * naar huis lopen (typically Turkish) on the screen of a person performing a certain task/action, the
home walk student receives prompts as to the words (i.e. verbs in
c. * loopt naar huis de jongen infinitive form) to be used and he/she has to speak a complete
walks home the boy sentence with the correct forms of verbs and nouns. Such an
exercise can also be used for syntax to check whether the
(7) de jongen loopt naar huis pronominal subject is being used appropriately or whether
the boy walks home words are being used in the right order. For the latter aspect,
cloze exercises can be designed in which an incomplete

2792
utterance is shown on the screen; one word is missing, that 4. Development and Integration of Speech
word is displayed somewhere else on the screen, and the
learner has to speak up the complete utterance with the word technology into COurseware for language
inserted on the right place. learning (DISCO)
3.1. Error detection for morphology and syntax The idea of extending the Dutch-CAPT approach to
morphology and syntax by using the exercises and the
For detecting morphological and syntactic errors, response detection techniques described above was elaborated in a
expansion software can be used. This software takes research proposal named DISCO, which was eventually
appropriate responses as input and expands them to form financed within the framework of the Dutch Flemish
pools of correct and incorrect responses. The software is stimulation programme for HLT called STEVIN. The aim of
based on modules for sentence and word expansion, which the DISCO project is to develop a prototype of an ASR-based
have been developed by Polderland: the Polderland CALL application for Dutch as a second language (DL2). The
lemmatizer, the Polderland Part-of-Speech tagger and application optimizes learning through interaction in realistic
Lexpand, a product for morphologic expansion of lemma’s to communication situations and provides intelligent feedback
all possible word forms, and KLiP Thesaurus, a product for on important aspects of DL2 speaking, viz. pronunciation,
semantic expansion of tokens resulting from another morphology, and syntax. The application should be able to
STEVIN project “Rechtsorde” . detect and give feedback on errors that are made by learners
of Dutch as a second language.
3.1.1. Detecting syntactic errors With respect to pronunciation, we aim at the achievement
of intelligibility, rather than accent-free pronunciation. As a
For detecting syntactic errors it is sufficient to know which
consequence, the system will target primarily those aspects
words were spoken in which order. The speech recognition
that appear to be most problematic. In previous research [3]
module determines which utterance was spoken, the exercises
we have gathered relevant information in this respect. The
database contains the syntactic errors for the utterance
pronunciation exercises will address the sounds that were
(generated by response expansion module, e.g. pronominal
trained in [2] and some additional problematic sounds.
subject omission, incorrect word order etc.). Depending on
which of the possible utterances has been recognized, the
4.1. Design
system can determine whether errors have been made with
respect to e.g. word order and/or pronominal subject A general framework for implementing and testing
omission. communicative CALL exercises is being developed. The
client-server architecture integrates an ASR module, and
3.1.2. Detecting morphological errors several modules for further processing of the ASR output in
an environment in which media content can be re-used to
For detecting morphological errors, the system should be able
develop exercises. The system also supports a simple
to distinguish, e.g., /maak/, /maken/, /maakte/ and /maakt/.
mechanism for the generation of feedback and it comes with a
Some of these variants are included in the list of possible
tool that supports the implementation of new exercises on the
responses, i.e. the ones related to frequent errors, which can
basis of existing media content. As in Dutch-CAPT use will
be detected with sufficient reliability by means of confidence
be made of media content from the Nieuwe Buren program
measures at utterance - or word – level, and for which
[2], which will be adapted to suit the aims of DISCO. In
inclusion improves the performance of the speech recognition
DISCO the ASR module will be based on SPRAAK, the
module. This already provides information on some
result of another STEVIN project.
morphological errors. However, our previous research made
All courseware is stored in a database. It consists of the
clear that for many of these pronunciation related errors a
course structure, course material to be presented to the user
more detailed analysis at segmental level is needed.
(consisting of moving images, pictures, texts and sounds), and
To this end, an automatic segmentation at phone level is
exercise details: content, expected responses, and feedback
made [2], [6], followed by a calculation of confidence
information. Tools are provided to fill the courseware
measures for the individual phones. Criteria similar to those
database and automatically expand expected responses. User
described in [2] can be used to select the phones / errors to be
performance and progress information are stored in a second
addressed. In short, the focus has to be on errors that are
database.
frequent, salient, persistent and that can be detected with
The courseware application is realized as a client/server
sufficient reliability. In the Dutch-CAPT system we have
application which enables realization as a stand-alone as well
employed the Goodness-Of-Pronunciation (GOP) score [7]. A
as a web-based version. All logic functionality is located in
GOP score is a log-likelihood ratio that can be calculated with
the server; the “thin” client contains GUI representation
the same algorithm for every phone, and this score then has to
software and user-server communication functions.
be compared with a phone specific threshold to determine
The server contains a module to handle interaction with
whether the pronunciation was correct or not [2], [7]. We also
the client. A second module, the course and exercise logic
experimented with acoustic-phonetic classifiers for error
handling module, guides the user through the course and
detection [8]. In [8] we compared the two techniques and
presents course material – including exercises – to the user. It
found that the performance of the acoustic-phonetic classifiers
collects user responses to exercises, has them processed, and
was better. We now intend to study what works best: GOP,
tracks user progress.
acoustic-phonetic classifiers, or a combination of the two.
User responses to exercises are forwarded to the speech
Since classifiers have been developed for only a small number
recognition module, which uses the courseware and exercise
of phonetic contrasts, additional classifiers need to be
database to check for matches with expected correct or
developed for those phonetic contrasts that are relevant in this
incorrect responses. When a response has been identified, the
context.
diagnostic modules are activated to validate the speech

2793
realization quality and the morpho-syntactic quality of the 4.3. Evaluation
user’s response. Depending on the results of the validation, a
proper feedback form is selected and passed on to the course Evaluation will take place at several times and at several
and exercise logic handling module. Following an update to levels. Four pilot experiments will be carried out which are
the user performance database, feedback is forwarded to the aimed at testing the exercises, the speech recognition module,
client. When the speech recognition software fails to identify the error detection module, and the whole system,
one of the expected responses, an appropriate message is respectively. The latter is a preparation of the final evaluation
passed on to the user, and the user is asked to retry. of the whole system.
A system that gives meaningful feedback must operate in
4.2. Feedback a manner that is similar to what a competent teacher would
do. Therefore, for the final evaluation of the whole system we
Feedback is provided on two levels: (1) on the utterance level, propose a design in which different groups of students of DL2
and (2) on the error level. Regarding the former, the speech use the system and fill in a questionnaire with which we can
recognition module determines which utterance was spoken, measure the students’ satisfaction in working with the system.
and before proceeding to the error detection module the Teachers of DL2 will then assess all sets of system prompt,
learner is given feedback on the recognized utterance. After student response and system feedback for the quality of the
all, it would be highly confusing if the learner gets feedback feedback on the level of pronunciation, morphology and
on (parts of) an utterance that was not spoken at all by the syntax. For this purpose, recordings will be made of students
learner. Only after the learner has indicated that the utterance who complete the exercises developed to test the DISCO
has been recognized correctly, does the system proceed to system.
error detection. On the other hand, if the utterance cannot be Given the evaluation design sketched above, we consider
recognized, the learner will get a message that the system the project successful from a scientific point of view if the
cannot process the utterance. DL2 teachers agree that the system behaves in a way that
For providing CF we adopt a user interface which is based makes it as useful for the students as a teacher is, and if the
on the one developed for Dutch-CAPT, which is extended to students rate the system positively on its most important
provide CF on morphological and syntactic errors. The exact aspects. From a valorization point of view we consider the
form of feedback on this latter type of errors will be chosen project successful if the results of this project are taken up to
on the basis of pilot experiments in which different formats develop applications.
will be tested.
In the preliminary research we carried out while preparing 5. Acknowledgements
the research proposal, a limited number of experienced
teachers were asked to indicate how they provide feedback in Partners in the DISCO project are J. Colpaert (Linguapolis,
specific situations. One method that appears to be very University of Antwerp), J. Bakx (Universitair Taal- en
effective for providing feedback on syntax concerns the use of Communicatiecentrum Nijmegen), and I. de Mönnink
gestures that refer to specific syntactic errors. In the DISCO (Polderland Language & Speech Technology). The DISCO
project the effectiveness of this type of CF will be tested by project is carried out within the STEVIN programme which is
using pictograms that refer to such gestures. In addition, more funded by the Dutch and Flemish Governments
experienced teachers will be asked to indicate how feedback (http://taalunieversum.org/taal/technologie/stevin/).
could best be provided in specific situations. On the basis of
their input rules will be defined and implemented in the 6. References
system. Feedback can consist of textual and graphical
[1] Bloom, B. S. “The 2 sigma problem: The search for methods of
information rendered on the screen. For example, if a
group instruction as effective as one-to-one tutoring”.
morpho-syntactic error is detected, DISCO will display the Educational Researcher, 13, 4-16, 1984.
correct form with the errors highlighted. In all cases the [2] Neri, A. Cucchiarini, C. and Strik, H. “The effectiveness of
student will eventually have the opportunity to listen to a computer-based corrective feedback for improving segmental
correct version of the response. quality in L2-Dutch”, ReCALL, Vol 20, No. 2, May 2008.
For each utterance, feedback will be provided on a limited [3] Neri, A., Cucchiarini, C. and Strik, H. “Selecting segmental errors
number of errors, for instance, maximally three or four, and in non-native Dutch for optimal pronunciation training”,
these errors will be selected on a number of selection criteria. International Review of Applied Linguistics, 44, 2006.
In any case feedback will be provided only on those errors [4] DeKeyser, R. “What Makes Learning Second-Language Grammar
Difficult? A Review of Issues”, Language Learning, 55, S1, 1-
that can be detected with an acceptable degree of reliability.
25, 2005.
In this respect it is important to mention that in this project [5] Schmidt, R.W. “The role of consciousness in second language
we will follow the approach adopted in Dutch-CAPT with learning”, Applied Linguistics 11, 129-158, 1990.
respect to false detections. As is well known, there is a trade- [6] Franco, H., Neumeyer, L., Digalakis, V., and Ronen, O.
off between false accepts (FAs, accepting an error as correct) “Combination of machine scores for automatic grading of
and false rejects (FRs, rejecting something that was actually pronunciation quality.” Speech Communication, 30, 121-130,
correct). In Dutch-CAPT we decided to minimize FRs and 2000.
tolerate some FAs on the grounds that for learners [7] Witt, S.M. & Young, S. ”Phone-level Pronunciation Scoring and
erroneously rejecting correct realizations would be more Assessment for Interactive Language Learning”, Speech
Communication, 30(2), 95-108, 2000.
detrimental than erroneously accepting incorrect ones. This
[8] Strik, H., Truong, K., de Wet, F. and Cucchiarini, C. “Comparing
will enable learners to concentrate only on the most serious classifiers for pronunciation error detection”, Proceedings of
errors and to gain self-confidence, while minimizing the Interspeech-2007, Antwerp, Belgium, 1837-1840, 2007.
number of times an error is incorrectly given feedback on. [9] Cucchiarini, C., Neri, A. de Wet, F. and Strik, H. “ASR-based
pronunciation training: scoring accuracy and pedagogical
effectiveness of a system for Dutch L2 learners”, Proceedings
Interspeech 2007, Antwerp, Belgium, 2007.

2794

You might also like