DISCO: Development and Integration of Speech Technology Into Courseware For Language Learning
DISCO: Development and Integration of Speech Technology Into Courseware For Language Learning
DISCO: Development and Integration of Speech Technology Into Courseware For Language Learning
2792
utterance is shown on the screen; one word is missing, that 4. Development and Integration of Speech
word is displayed somewhere else on the screen, and the
learner has to speak up the complete utterance with the word technology into COurseware for language
inserted on the right place. learning (DISCO)
3.1. Error detection for morphology and syntax The idea of extending the Dutch-CAPT approach to
morphology and syntax by using the exercises and the
For detecting morphological and syntactic errors, response detection techniques described above was elaborated in a
expansion software can be used. This software takes research proposal named DISCO, which was eventually
appropriate responses as input and expands them to form financed within the framework of the Dutch Flemish
pools of correct and incorrect responses. The software is stimulation programme for HLT called STEVIN. The aim of
based on modules for sentence and word expansion, which the DISCO project is to develop a prototype of an ASR-based
have been developed by Polderland: the Polderland CALL application for Dutch as a second language (DL2). The
lemmatizer, the Polderland Part-of-Speech tagger and application optimizes learning through interaction in realistic
Lexpand, a product for morphologic expansion of lemma’s to communication situations and provides intelligent feedback
all possible word forms, and KLiP Thesaurus, a product for on important aspects of DL2 speaking, viz. pronunciation,
semantic expansion of tokens resulting from another morphology, and syntax. The application should be able to
STEVIN project “Rechtsorde” . detect and give feedback on errors that are made by learners
of Dutch as a second language.
3.1.1. Detecting syntactic errors With respect to pronunciation, we aim at the achievement
of intelligibility, rather than accent-free pronunciation. As a
For detecting syntactic errors it is sufficient to know which
consequence, the system will target primarily those aspects
words were spoken in which order. The speech recognition
that appear to be most problematic. In previous research [3]
module determines which utterance was spoken, the exercises
we have gathered relevant information in this respect. The
database contains the syntactic errors for the utterance
pronunciation exercises will address the sounds that were
(generated by response expansion module, e.g. pronominal
trained in [2] and some additional problematic sounds.
subject omission, incorrect word order etc.). Depending on
which of the possible utterances has been recognized, the
4.1. Design
system can determine whether errors have been made with
respect to e.g. word order and/or pronominal subject A general framework for implementing and testing
omission. communicative CALL exercises is being developed. The
client-server architecture integrates an ASR module, and
3.1.2. Detecting morphological errors several modules for further processing of the ASR output in
an environment in which media content can be re-used to
For detecting morphological errors, the system should be able
develop exercises. The system also supports a simple
to distinguish, e.g., /maak/, /maken/, /maakte/ and /maakt/.
mechanism for the generation of feedback and it comes with a
Some of these variants are included in the list of possible
tool that supports the implementation of new exercises on the
responses, i.e. the ones related to frequent errors, which can
basis of existing media content. As in Dutch-CAPT use will
be detected with sufficient reliability by means of confidence
be made of media content from the Nieuwe Buren program
measures at utterance - or word – level, and for which
[2], which will be adapted to suit the aims of DISCO. In
inclusion improves the performance of the speech recognition
DISCO the ASR module will be based on SPRAAK, the
module. This already provides information on some
result of another STEVIN project.
morphological errors. However, our previous research made
All courseware is stored in a database. It consists of the
clear that for many of these pronunciation related errors a
course structure, course material to be presented to the user
more detailed analysis at segmental level is needed.
(consisting of moving images, pictures, texts and sounds), and
To this end, an automatic segmentation at phone level is
exercise details: content, expected responses, and feedback
made [2], [6], followed by a calculation of confidence
information. Tools are provided to fill the courseware
measures for the individual phones. Criteria similar to those
database and automatically expand expected responses. User
described in [2] can be used to select the phones / errors to be
performance and progress information are stored in a second
addressed. In short, the focus has to be on errors that are
database.
frequent, salient, persistent and that can be detected with
The courseware application is realized as a client/server
sufficient reliability. In the Dutch-CAPT system we have
application which enables realization as a stand-alone as well
employed the Goodness-Of-Pronunciation (GOP) score [7]. A
as a web-based version. All logic functionality is located in
GOP score is a log-likelihood ratio that can be calculated with
the server; the “thin” client contains GUI representation
the same algorithm for every phone, and this score then has to
software and user-server communication functions.
be compared with a phone specific threshold to determine
The server contains a module to handle interaction with
whether the pronunciation was correct or not [2], [7]. We also
the client. A second module, the course and exercise logic
experimented with acoustic-phonetic classifiers for error
handling module, guides the user through the course and
detection [8]. In [8] we compared the two techniques and
presents course material – including exercises – to the user. It
found that the performance of the acoustic-phonetic classifiers
collects user responses to exercises, has them processed, and
was better. We now intend to study what works best: GOP,
tracks user progress.
acoustic-phonetic classifiers, or a combination of the two.
User responses to exercises are forwarded to the speech
Since classifiers have been developed for only a small number
recognition module, which uses the courseware and exercise
of phonetic contrasts, additional classifiers need to be
database to check for matches with expected correct or
developed for those phonetic contrasts that are relevant in this
incorrect responses. When a response has been identified, the
context.
diagnostic modules are activated to validate the speech
2793
realization quality and the morpho-syntactic quality of the 4.3. Evaluation
user’s response. Depending on the results of the validation, a
proper feedback form is selected and passed on to the course Evaluation will take place at several times and at several
and exercise logic handling module. Following an update to levels. Four pilot experiments will be carried out which are
the user performance database, feedback is forwarded to the aimed at testing the exercises, the speech recognition module,
client. When the speech recognition software fails to identify the error detection module, and the whole system,
one of the expected responses, an appropriate message is respectively. The latter is a preparation of the final evaluation
passed on to the user, and the user is asked to retry. of the whole system.
A system that gives meaningful feedback must operate in
4.2. Feedback a manner that is similar to what a competent teacher would
do. Therefore, for the final evaluation of the whole system we
Feedback is provided on two levels: (1) on the utterance level, propose a design in which different groups of students of DL2
and (2) on the error level. Regarding the former, the speech use the system and fill in a questionnaire with which we can
recognition module determines which utterance was spoken, measure the students’ satisfaction in working with the system.
and before proceeding to the error detection module the Teachers of DL2 will then assess all sets of system prompt,
learner is given feedback on the recognized utterance. After student response and system feedback for the quality of the
all, it would be highly confusing if the learner gets feedback feedback on the level of pronunciation, morphology and
on (parts of) an utterance that was not spoken at all by the syntax. For this purpose, recordings will be made of students
learner. Only after the learner has indicated that the utterance who complete the exercises developed to test the DISCO
has been recognized correctly, does the system proceed to system.
error detection. On the other hand, if the utterance cannot be Given the evaluation design sketched above, we consider
recognized, the learner will get a message that the system the project successful from a scientific point of view if the
cannot process the utterance. DL2 teachers agree that the system behaves in a way that
For providing CF we adopt a user interface which is based makes it as useful for the students as a teacher is, and if the
on the one developed for Dutch-CAPT, which is extended to students rate the system positively on its most important
provide CF on morphological and syntactic errors. The exact aspects. From a valorization point of view we consider the
form of feedback on this latter type of errors will be chosen project successful if the results of this project are taken up to
on the basis of pilot experiments in which different formats develop applications.
will be tested.
In the preliminary research we carried out while preparing 5. Acknowledgements
the research proposal, a limited number of experienced
teachers were asked to indicate how they provide feedback in Partners in the DISCO project are J. Colpaert (Linguapolis,
specific situations. One method that appears to be very University of Antwerp), J. Bakx (Universitair Taal- en
effective for providing feedback on syntax concerns the use of Communicatiecentrum Nijmegen), and I. de Mönnink
gestures that refer to specific syntactic errors. In the DISCO (Polderland Language & Speech Technology). The DISCO
project the effectiveness of this type of CF will be tested by project is carried out within the STEVIN programme which is
using pictograms that refer to such gestures. In addition, more funded by the Dutch and Flemish Governments
experienced teachers will be asked to indicate how feedback (http://taalunieversum.org/taal/technologie/stevin/).
could best be provided in specific situations. On the basis of
their input rules will be defined and implemented in the 6. References
system. Feedback can consist of textual and graphical
[1] Bloom, B. S. “The 2 sigma problem: The search for methods of
information rendered on the screen. For example, if a
group instruction as effective as one-to-one tutoring”.
morpho-syntactic error is detected, DISCO will display the Educational Researcher, 13, 4-16, 1984.
correct form with the errors highlighted. In all cases the [2] Neri, A. Cucchiarini, C. and Strik, H. “The effectiveness of
student will eventually have the opportunity to listen to a computer-based corrective feedback for improving segmental
correct version of the response. quality in L2-Dutch”, ReCALL, Vol 20, No. 2, May 2008.
For each utterance, feedback will be provided on a limited [3] Neri, A., Cucchiarini, C. and Strik, H. “Selecting segmental errors
number of errors, for instance, maximally three or four, and in non-native Dutch for optimal pronunciation training”,
these errors will be selected on a number of selection criteria. International Review of Applied Linguistics, 44, 2006.
In any case feedback will be provided only on those errors [4] DeKeyser, R. “What Makes Learning Second-Language Grammar
Difficult? A Review of Issues”, Language Learning, 55, S1, 1-
that can be detected with an acceptable degree of reliability.
25, 2005.
In this respect it is important to mention that in this project [5] Schmidt, R.W. “The role of consciousness in second language
we will follow the approach adopted in Dutch-CAPT with learning”, Applied Linguistics 11, 129-158, 1990.
respect to false detections. As is well known, there is a trade- [6] Franco, H., Neumeyer, L., Digalakis, V., and Ronen, O.
off between false accepts (FAs, accepting an error as correct) “Combination of machine scores for automatic grading of
and false rejects (FRs, rejecting something that was actually pronunciation quality.” Speech Communication, 30, 121-130,
correct). In Dutch-CAPT we decided to minimize FRs and 2000.
tolerate some FAs on the grounds that for learners [7] Witt, S.M. & Young, S. ”Phone-level Pronunciation Scoring and
erroneously rejecting correct realizations would be more Assessment for Interactive Language Learning”, Speech
Communication, 30(2), 95-108, 2000.
detrimental than erroneously accepting incorrect ones. This
[8] Strik, H., Truong, K., de Wet, F. and Cucchiarini, C. “Comparing
will enable learners to concentrate only on the most serious classifiers for pronunciation error detection”, Proceedings of
errors and to gain self-confidence, while minimizing the Interspeech-2007, Antwerp, Belgium, 1837-1840, 2007.
number of times an error is incorrectly given feedback on. [9] Cucchiarini, C., Neri, A. de Wet, F. and Strik, H. “ASR-based
pronunciation training: scoring accuracy and pedagogical
effectiveness of a system for Dutch L2 learners”, Proceedings
Interspeech 2007, Antwerp, Belgium, 2007.
2794