Studies in health technology and informatics, 2014
CogWatch is an assistive system to re-train stroke survivors suffering from Apraxia or Action Dis... more CogWatch is an assistive system to re-train stroke survivors suffering from Apraxia or Action Disorganization Syndrome (AADS) to complete activities of daily living (ADLs). This paper describes the approach to real-time planning based on a Markov Decision Process (MDP), and demonstrates its ability to improve task's performance via user simulation. The paper concludes with a discussion of the remaining challenges and future enhancements.
IFIP Advances in Information and Communication Technology, 2015
This paper presents a Partially Observable Markov Decision Process (POMDP) model for action plann... more This paper presents a Partially Observable Markov Decision Process (POMDP) model for action planning and human errors detection, during Activities of Daily Living (ADLs). This model is integrated into a sub-component of an assistive system designed for stroke survivors; it is called the Artificial Intelligent Planning System (AIPS). Its main goal is to monitor the user's history of actions during a specific task, and to provide meaningful assistance when an error is detected in his/her sequence of actions. To do so, the AIPS must cope with the ambiguity in the outputs of the other system's components. In this paper, we first give an overview of the global assistive system where the AIPS is implemented, and explain how it interacts with the user to guide him/her during tea-making. We then define the POMDP models and the Monte Carlo Algorithm used to learn how to retrieve optimal prompts, and detect human errors under uncertainty.
Accent is cited as an issue for speech recognition systems. If they are to be widely deployed, Au... more Accent is cited as an issue for speech recognition systems. If they are to be widely deployed, Automatic Speech Recognition (ASR) systems must deliver consistently high performance across user populations. Hence the development of accentrobust ASR is of significant importance. This research investigates techniques for compensating for the effects of accents on performance of Hidden Markov Model (HMM) based ASR systems. Recently, HMM systems based on Deep Neural Networks (DNNs) have achieved superior performance to more traditional systems based on Gaussian Mixture Models (GMMs), due to the discriminative nature of DNNs. Our research confirms, this by showing that a DNN system outperforms the GMM system even after an accent-dependent acoustic model was selected using Accent Identification (AID), followed by speaker adaptation. The average performance of the DNN system over all accent groups is maximized when either accent diversity is highest, or data from “difficult” accent-groups i...
SLaTE 2019: 8th ISCA Workshop on Speech and Language Technology in Education, 2019
This paper describes the systems developed by the University of Birmingham for the 2019 Spoken CA... more This paper describes the systems developed by the University of Birmingham for the 2019 Spoken CALL Shared Task (ST) challenge. The task is automatic assessment of grammatical and semantic aspects of English spoken by German-speaking Swiss teenagers. Our system has two main components: automatic speech recognition (ASR) and text processing (TP). We use the ASR system that we developed for 2018 ST challenge. This is a DNN-HMM system based on sequence training with the state-level minimal Bayes risk criteria. It achieved worderror-rates (WER) of 8.89% for the ST2 test set and 10.94% for the ST3 test set. This paper focuses on development of the TP component. In particular, we explore machine learning (ML) approaches which preserve different degrees of word order. The ST responses are represented as vectors using Word2Vec and Doc2Vec models and the similarities between ASR transcriptions and reference responses are calculated using Word Mover's Distance (WMD) and Dynamic Programming (DP). A baseline rule-based TP system obtained a D f ull score of 5.639 and 5.476 for the ST2 and ST3 test set, respectively. The best ML-based TP, consisting of a Word2Vec model trained on the ST data, DP-based similarity calculation and a neural network, achieved D f ull score of 7.379 and 5.740 for ST2 and ST3 test sets, respectively.
Although it is generally accepted that different broad phone classes (BPCs) have different produc... more Although it is generally accepted that different broad phone classes (BPCs) have different production mechanisms and are better described by different types of features, most automatic speech recognition (ASR) systems use the same features and decision criteria for all phones. Motivated by this observation, this paper proposes a two-level DNN structure, referred to as a BPC-DNN, inspired by the notion of a topological manifold. In the first level, several small separate BPC-dependent DNNs are applied to different broad phonetic classes, and in the second level the outputs of these DNNs are fused to obtain senone-dependent posterior probabilities, which can be used for frame level classification or integrated into Viterbi decoding for phone recognition. In a previous paper using this approach we reported improved frame classification accuracy on the TIMIT corpus compared with a conventional DNN. The contribution of the present paper is to demonstrate that this advantage extends to full phone recognition. Our most recent results show that the BPC-DNN achieves reductions in error rate relative to a conventional DNN of 16% and 8% for frame classification and phone recognition, respectively.
Link to publication on Research at Birmingham portal General rights Unless a licence is specified... more Link to publication on Research at Birmingham portal General rights Unless a licence is specified above, all rights (including copyright and moral rights) in this document are retained by the authors and/or the copyright holders. The express permission of the copyright holder must be obtained for any use of this material other than for purposes permitted by law. •Users may freely distribute the URL that is used to identify this publication. •Users may download and/or print one copy of the publication from the University of Birmingham research portal for the purpose of private study or non-commercial research. •User may use extracts from the document in line with the concept of 'fair dealing' under the Copyright, Designs and Patents Act 1988 (?) •Users may not further distribute the material nor use it for the purposes of commercial gain. Where a licence is displayed above, please note the terms and conditions of the licence govern your use of this document. When citing, please reference the published version. Take down policy While the University of Birmingham exercises care and attention in making items available there are rare occasions when an item has been uploaded in error or has been deemed to be commercially or otherwise sensitive.
2019 27th European Signal Processing Conference (EUSIPCO)
Where a licence is displayed above, please note the terms and conditions of the licence govern yo... more Where a licence is displayed above, please note the terms and conditions of the licence govern your use of this document. When citing, please reference the published version. Take down policy While the University of Birmingham exercises care and attention in making items available there are rare occasions when an item has been uploaded in error or has been deemed to be commercially or otherwise sensitive.
Studies in health technology and informatics, 2014
CogWatch is an assistive system to re-train stroke survivors suffering from Apraxia or Action Dis... more CogWatch is an assistive system to re-train stroke survivors suffering from Apraxia or Action Disorganization Syndrome (AADS) to complete activities of daily living (ADLs). This paper describes the approach to real-time planning based on a Markov Decision Process (MDP), and demonstrates its ability to improve task's performance via user simulation. The paper concludes with a discussion of the remaining challenges and future enhancements.
IFIP Advances in Information and Communication Technology, 2015
This paper presents a Partially Observable Markov Decision Process (POMDP) model for action plann... more This paper presents a Partially Observable Markov Decision Process (POMDP) model for action planning and human errors detection, during Activities of Daily Living (ADLs). This model is integrated into a sub-component of an assistive system designed for stroke survivors; it is called the Artificial Intelligent Planning System (AIPS). Its main goal is to monitor the user's history of actions during a specific task, and to provide meaningful assistance when an error is detected in his/her sequence of actions. To do so, the AIPS must cope with the ambiguity in the outputs of the other system's components. In this paper, we first give an overview of the global assistive system where the AIPS is implemented, and explain how it interacts with the user to guide him/her during tea-making. We then define the POMDP models and the Monte Carlo Algorithm used to learn how to retrieve optimal prompts, and detect human errors under uncertainty.
Accent is cited as an issue for speech recognition systems. If they are to be widely deployed, Au... more Accent is cited as an issue for speech recognition systems. If they are to be widely deployed, Automatic Speech Recognition (ASR) systems must deliver consistently high performance across user populations. Hence the development of accentrobust ASR is of significant importance. This research investigates techniques for compensating for the effects of accents on performance of Hidden Markov Model (HMM) based ASR systems. Recently, HMM systems based on Deep Neural Networks (DNNs) have achieved superior performance to more traditional systems based on Gaussian Mixture Models (GMMs), due to the discriminative nature of DNNs. Our research confirms, this by showing that a DNN system outperforms the GMM system even after an accent-dependent acoustic model was selected using Accent Identification (AID), followed by speaker adaptation. The average performance of the DNN system over all accent groups is maximized when either accent diversity is highest, or data from “difficult” accent-groups i...
SLaTE 2019: 8th ISCA Workshop on Speech and Language Technology in Education, 2019
This paper describes the systems developed by the University of Birmingham for the 2019 Spoken CA... more This paper describes the systems developed by the University of Birmingham for the 2019 Spoken CALL Shared Task (ST) challenge. The task is automatic assessment of grammatical and semantic aspects of English spoken by German-speaking Swiss teenagers. Our system has two main components: automatic speech recognition (ASR) and text processing (TP). We use the ASR system that we developed for 2018 ST challenge. This is a DNN-HMM system based on sequence training with the state-level minimal Bayes risk criteria. It achieved worderror-rates (WER) of 8.89% for the ST2 test set and 10.94% for the ST3 test set. This paper focuses on development of the TP component. In particular, we explore machine learning (ML) approaches which preserve different degrees of word order. The ST responses are represented as vectors using Word2Vec and Doc2Vec models and the similarities between ASR transcriptions and reference responses are calculated using Word Mover's Distance (WMD) and Dynamic Programming (DP). A baseline rule-based TP system obtained a D f ull score of 5.639 and 5.476 for the ST2 and ST3 test set, respectively. The best ML-based TP, consisting of a Word2Vec model trained on the ST data, DP-based similarity calculation and a neural network, achieved D f ull score of 7.379 and 5.740 for ST2 and ST3 test sets, respectively.
Although it is generally accepted that different broad phone classes (BPCs) have different produc... more Although it is generally accepted that different broad phone classes (BPCs) have different production mechanisms and are better described by different types of features, most automatic speech recognition (ASR) systems use the same features and decision criteria for all phones. Motivated by this observation, this paper proposes a two-level DNN structure, referred to as a BPC-DNN, inspired by the notion of a topological manifold. In the first level, several small separate BPC-dependent DNNs are applied to different broad phonetic classes, and in the second level the outputs of these DNNs are fused to obtain senone-dependent posterior probabilities, which can be used for frame level classification or integrated into Viterbi decoding for phone recognition. In a previous paper using this approach we reported improved frame classification accuracy on the TIMIT corpus compared with a conventional DNN. The contribution of the present paper is to demonstrate that this advantage extends to full phone recognition. Our most recent results show that the BPC-DNN achieves reductions in error rate relative to a conventional DNN of 16% and 8% for frame classification and phone recognition, respectively.
Link to publication on Research at Birmingham portal General rights Unless a licence is specified... more Link to publication on Research at Birmingham portal General rights Unless a licence is specified above, all rights (including copyright and moral rights) in this document are retained by the authors and/or the copyright holders. The express permission of the copyright holder must be obtained for any use of this material other than for purposes permitted by law. •Users may freely distribute the URL that is used to identify this publication. •Users may download and/or print one copy of the publication from the University of Birmingham research portal for the purpose of private study or non-commercial research. •User may use extracts from the document in line with the concept of 'fair dealing' under the Copyright, Designs and Patents Act 1988 (?) •Users may not further distribute the material nor use it for the purposes of commercial gain. Where a licence is displayed above, please note the terms and conditions of the licence govern your use of this document. When citing, please reference the published version. Take down policy While the University of Birmingham exercises care and attention in making items available there are rare occasions when an item has been uploaded in error or has been deemed to be commercially or otherwise sensitive.
2019 27th European Signal Processing Conference (EUSIPCO)
Where a licence is displayed above, please note the terms and conditions of the licence govern yo... more Where a licence is displayed above, please note the terms and conditions of the licence govern your use of this document. When citing, please reference the published version. Take down policy While the University of Birmingham exercises care and attention in making items available there are rare occasions when an item has been uploaded in error or has been deemed to be commercially or otherwise sensitive.
Uploads
Papers by Martin Russell