In this paper we determine how multi-layer ensembling improves performance on multilingual intent... more In this paper we determine how multi-layer ensembling improves performance on multilingual intent classification. We develop a novel multi-layer ensembling approach that ensembles both different model initializations and different model architectures. We also introduce a new banking domain dataset and compare results against the standard ATIS dataset and the Chinese SMP2017 dataset to determine ensembling performance in multilingual and multi-domain contexts. We run ensemble experiments across all three datasets, and conclude that ensembling provides significant performance increases, and that multi-layer ensembling is a no-risk way to improve performance on intent classification. We also find that a diverse ensemble of simple models can reach perform comparable to much more sophisticated state-of-the-art models. Our best F 1 scores on ATIS, Banking, and SMP are 97.54%, 91.79%, and 93.55% respectively, which compare well with the state-of-the-art on ATIS and best submission to the SMP2017 competition. The total ensembling performance increases we achieve are 0.23%, 1.96%, and 4.04% F 1 respectively.
In this paper, we reveal the depreciation mechanism of representative money (banknotes) from the ... more In this paper, we reveal the depreciation mechanism of representative money (banknotes) from the perspective of logistics warehousing costs. Although it has long been the dream of economists to stabilize the buying power of the monetary units, the goal we have honest money always broken since the central bank depreciate the currency without limit. From the point of view of modern logistics, the key functions of money are the store of value and low logistics (circulation and warehouse) cost. Although commodity money (such as gold and silver) has the advantages of a wealth store, its disadvantage is the high logistics cost. In comparison to commodity money, credit currency and digital currency cannot protect wealth from loss over a long period while their logistics costs are negligible. We proved that there is not such honest money from the perspective of logistics costs, which is both the store of value like precious metal and without logistics costs in circulation like digital currency. The reason hidden in the back of the depreciation of banknotes is the black hole of storage charge of the anchor overtime after digitizing commodity money. Accordingly, it is not difficult to infer the inevitable collapse of the Bretton woods system. Therefore, we introduce a brand-new currency named honest devalued stable-coin and built a attenuation model of intrinsic value of the honest money based on the change mechanism of storage cost of anchor assets, like gold, which will lay the theoretical foundation for a stable monetary system.
Spoken language understanding (SLU) is an important problem in natural language processing, which... more Spoken language understanding (SLU) is an important problem in natural language processing, which involves identifying a user's intent and assigning a semantic concept to each word in a sentence. This paper presents a word feature vector method and combines it into the convolutional neural network (CNN). We consider 18 word features and each word feature is constructed by merging similar word labels. By introducing the concept of external library, we propose a feature set approach that is beneficial for building the relationship between a word from the training dataset and the feature. Computational results are reported using the ATIS dataset and comparisons with traditional CNN as well as bi-directional sequential CNN are also presented.
Intent classification has been widely researched on English data with deep learning approaches th... more Intent classification has been widely researched on English data with deep learning approaches that are based on neural networks and word embeddings. The challenge for Chinese intent classification stems from the fact that, unlike English where most words are made up of 26 phonologic alphabet letters, Chinese is logographic, where a Chinese character is a more basic semantic unit that can be informative and its meaning does not vary too much in contexts. Chinese word embeddings alone can be inadequate for representing words, and pre-trained embeddings can suffer from not aligning well with the task at hand. To account for the inadequacy and leverage Chinese character information, we propose a low-effort and generic way to dynamically integrate character embedding based feature maps with word embedding based inputs, whose resulting word-character embeddings are stacked with a contextual information extraction module to further incorporate context information for predictions. On top of the proposed model, we employ an ensemble method to combine single models and obtain the final result. The approach is dataindependent without relying on external sources like pre-trained word embeddings. The proposed model outperforms baseline models and existing methods. Place licence statement here for the camera-ready version.
Transportation Research Record: Journal of the Transportation Research Board, 2022
Circulation planning for electric multiple units (EMUs) is regarded as one of the key operation i... more Circulation planning for electric multiple units (EMUs) is regarded as one of the key operation issues for a high-speed railway transportation system. The EMU circulation plan consists of determining the connections of trains while accomplishing the passengers’ demands. EMUs need regular maintenance at a certain interval of kilometers or minutes for safety reasons. Consequently, the circulation plan must ensure that EMU trains can reach the maintenance depots in time for their required maintenance. This paper proposes a 0-1 integer programming model for the EMU circulation plan with the aim of minimizing the total costs of the mileage losses of the EMUs, which is incurred when they undergo a maintenance check before the corresponding travel mileage reaches the limit of the cycle. Given that the accumulated travel mileage of EMUs is allowed to be 10% above the standard mileage cycle in practice, an ingenious fuzzy maintenance constraint is presented to tackle the mileage cycle constr...
Social robots deployed in public spaces present a challenging task for ASR because of a variety o... more Social robots deployed in public spaces present a challenging task for ASR because of a variety of factors, including noise SNR of 20 to 5 dB. Existing ASR models perform well for higher SNRs in this range, but degrade considerably with more noise. This work explores methods for providing improved ASR performance in such conditions. We use the AiShell-1 Chinese speech corpus and the Kaldi ASR toolkit for evaluations. We were able to exceed state-of-the-art ASR performance with SNR lower than 20 dB, demonstrating the feasibility of achieving relatively high performing ASR with open-source toolkits and hundreds of hours of training data, which is commonly available.
Zara, or ‘Zara the Supergirl’ is a virtual robot, that can exhibit empathy while interacting with... more Zara, or ‘Zara the Supergirl’ is a virtual robot, that can exhibit empathy while interacting with an user, with the aid of its built in facial and emotion recognition, sentiment analysis, and speech module. At the end of the 5-10 minute conversation, Zara can give a personality analysis of the user based on all the user utterances. We have also implemented a real-time emotion recognition, using a CNN model that detects emotion from raw audio without feature extraction, and have achieved an average of 65.7% accuracy on six different emotion classes, which is an impressive 4.5% improvement from the conventional feature based SVM classification. Also, we have described a CNN based sentiment analysis module trained using out-of-domain data, that recognizes sentiment from the speech recognition transcript, which has a 74.8 F-measure when tested on human-machine dialogues.
A high-speed train needs high-level maintenance when its accumulated running mileage or time reac... more A high-speed train needs high-level maintenance when its accumulated running mileage or time reaches predefined threshold. The date of delivering an Electric Multiple Unit (EMU) train to maintenance ranges within a time window rather than be a fixed date. Obviously, changing the delivering date always means a different impact on the supply of EMU trains and operation cost. Therefore, the delivering plan has the potential to be optimized. This paper formulates the EMU train high-level maintenance planning problem as a non-linear 0-1 programming model. The model aims at minimizing the mileage loss of all EMU trains with the consideration of the maintenance capacity of the workshop and maintenance ratio at different times. The number of trains under maintenance not only depends on the current maintenance plan, but also influenced by the trains whose maintenance time span from the last planning horizon to current horizon. A state function is established to describe whether a train is un...
Zara, or ‘Zara the Supergirl’, is a virtual robot that can show empathy while interacting with an... more Zara, or ‘Zara the Supergirl’, is a virtual robot that can show empathy while interacting with an user, and at the end of a 5-10 minute conversation, it can give a personality analysis based on the user responses. It can display and share emotions with the aid of its built in sentiment analysis, facial and emotion recognition, and speech module. Being the first of its kind, it has successfully integrated an empathetic system along with the human emotion recognition and sharing, into an augmented humanrobot interaction system. Zara was also displayed at the World Economic Forum held at Dalian in September 2015.
Our team intends to develop a recommendation system for job seekers based on the information of c... more Our team intends to develop a recommendation system for job seekers based on the information of current employees in big companies. Several models are implemented to achieve over 60% success rate in classifying employees, and we use these models to help job seekers identify their best fitting company.
In this paper, we reveal the attenuation mechanism of anchor of the commodity money from the pers... more In this paper, we reveal the attenuation mechanism of anchor of the commodity money from the perspective of logistics warehousing costs, and propose a novel Decayed Commodity Money (DCM) for the store of value across time and space. Considering the logistics cost of commodity warehousing by the third financial institution such as London Metal Exchange, we can award the difference between the original and the residual value of the anchor to the financial institution. This type of currency has the characteristic of self-decaying value over time. Therefore DCM has the advantages of both the commodity money which has the function of preserving wealth and credit currency without the logistics cost. In addition, DCM can also avoid the defects that precious metal money is hoarded by market and credit currency often leads to excessive liquidity. DCM is also different from virtual currency, such as bitcoin, which does not have a corresponding commodity anchor. As a conclusion, DCM can provid...
Sentiment analysis of reviews is a popular task in natural language processing. In this work, the... more Sentiment analysis of reviews is a popular task in natural language processing. In this work, the goal is to predict the score of food reviews on a scale of 1 to 5 with two recurrent neural networks that are carefully tuned. As for baseline, we train a simple RNN for classification. Then we extend the baseline to GRU. In addition, we present two different methods to deal with highly skewed data, which is a common problem for reviews. Models are evaluated using accuracies.
Abstract An essential problem encountered in a railway freight transport system is determining th... more Abstract An essential problem encountered in a railway freight transport system is determining the best formation plan on a capacity-constrained physical network. The formation plan is not only the foundation of railway operations, but also the basis of train scheduling, yard and terminal management, and infrastructure resource planning. An integrated optimization of the traffic routing and train formation plan aims at designing a globally optimal train service network to decide: between which pairs of reclassification yards (terminals) should provide a block, the frequencies of the train services, the physical paths of traffic, and the block sequences of shipments. This study proposes a non-linear binary programming model to address the integrated problem to minimize the total costs of accumulation, reclassification, and travel distances while satisfying various practical requirements. An efficient simulated annealing based heuristic solution approach is developed to solve the mathematical model. We use a penalty function method to tackle the capacity constraints and a customized method for the operational requirements. The feasibility of the solving approach is tested using a numerical case study based on the east-west railway channel of China. The computational results of a national-scale example based on the China railway network consisting of 235 nodes indicates that the proposed model and approach can achieve good quality solutions within an acceptable computing time.
In this paper we determine how multi-layer ensembling improves performance on multilingual intent... more In this paper we determine how multi-layer ensembling improves performance on multilingual intent classification. We develop a novel multi-layer ensembling approach that ensembles both different model initializations and different model architectures. We also introduce a new banking domain dataset and compare results against the standard ATIS dataset and the Chinese SMP2017 dataset to determine ensembling performance in multilingual and multi-domain contexts. We run ensemble experiments across all three datasets, and conclude that ensembling provides significant performance increases, and that multi-layer ensembling is a no-risk way to improve performance on intent classification. We also find that a diverse ensemble of simple models can reach perform comparable to much more sophisticated state-of-the-art models. Our best F 1 scores on ATIS, Banking, and SMP are 97.54%, 91.79%, and 93.55% respectively, which compare well with the state-of-the-art on ATIS and best submission to the SMP2017 competition. The total ensembling performance increases we achieve are 0.23%, 1.96%, and 4.04% F 1 respectively.
In this paper, we reveal the depreciation mechanism of representative money (banknotes) from the ... more In this paper, we reveal the depreciation mechanism of representative money (banknotes) from the perspective of logistics warehousing costs. Although it has long been the dream of economists to stabilize the buying power of the monetary units, the goal we have honest money always broken since the central bank depreciate the currency without limit. From the point of view of modern logistics, the key functions of money are the store of value and low logistics (circulation and warehouse) cost. Although commodity money (such as gold and silver) has the advantages of a wealth store, its disadvantage is the high logistics cost. In comparison to commodity money, credit currency and digital currency cannot protect wealth from loss over a long period while their logistics costs are negligible. We proved that there is not such honest money from the perspective of logistics costs, which is both the store of value like precious metal and without logistics costs in circulation like digital currency. The reason hidden in the back of the depreciation of banknotes is the black hole of storage charge of the anchor overtime after digitizing commodity money. Accordingly, it is not difficult to infer the inevitable collapse of the Bretton woods system. Therefore, we introduce a brand-new currency named honest devalued stable-coin and built a attenuation model of intrinsic value of the honest money based on the change mechanism of storage cost of anchor assets, like gold, which will lay the theoretical foundation for a stable monetary system.
Spoken language understanding (SLU) is an important problem in natural language processing, which... more Spoken language understanding (SLU) is an important problem in natural language processing, which involves identifying a user's intent and assigning a semantic concept to each word in a sentence. This paper presents a word feature vector method and combines it into the convolutional neural network (CNN). We consider 18 word features and each word feature is constructed by merging similar word labels. By introducing the concept of external library, we propose a feature set approach that is beneficial for building the relationship between a word from the training dataset and the feature. Computational results are reported using the ATIS dataset and comparisons with traditional CNN as well as bi-directional sequential CNN are also presented.
Intent classification has been widely researched on English data with deep learning approaches th... more Intent classification has been widely researched on English data with deep learning approaches that are based on neural networks and word embeddings. The challenge for Chinese intent classification stems from the fact that, unlike English where most words are made up of 26 phonologic alphabet letters, Chinese is logographic, where a Chinese character is a more basic semantic unit that can be informative and its meaning does not vary too much in contexts. Chinese word embeddings alone can be inadequate for representing words, and pre-trained embeddings can suffer from not aligning well with the task at hand. To account for the inadequacy and leverage Chinese character information, we propose a low-effort and generic way to dynamically integrate character embedding based feature maps with word embedding based inputs, whose resulting word-character embeddings are stacked with a contextual information extraction module to further incorporate context information for predictions. On top of the proposed model, we employ an ensemble method to combine single models and obtain the final result. The approach is dataindependent without relying on external sources like pre-trained word embeddings. The proposed model outperforms baseline models and existing methods. Place licence statement here for the camera-ready version.
Transportation Research Record: Journal of the Transportation Research Board, 2022
Circulation planning for electric multiple units (EMUs) is regarded as one of the key operation i... more Circulation planning for electric multiple units (EMUs) is regarded as one of the key operation issues for a high-speed railway transportation system. The EMU circulation plan consists of determining the connections of trains while accomplishing the passengers’ demands. EMUs need regular maintenance at a certain interval of kilometers or minutes for safety reasons. Consequently, the circulation plan must ensure that EMU trains can reach the maintenance depots in time for their required maintenance. This paper proposes a 0-1 integer programming model for the EMU circulation plan with the aim of minimizing the total costs of the mileage losses of the EMUs, which is incurred when they undergo a maintenance check before the corresponding travel mileage reaches the limit of the cycle. Given that the accumulated travel mileage of EMUs is allowed to be 10% above the standard mileage cycle in practice, an ingenious fuzzy maintenance constraint is presented to tackle the mileage cycle constr...
Social robots deployed in public spaces present a challenging task for ASR because of a variety o... more Social robots deployed in public spaces present a challenging task for ASR because of a variety of factors, including noise SNR of 20 to 5 dB. Existing ASR models perform well for higher SNRs in this range, but degrade considerably with more noise. This work explores methods for providing improved ASR performance in such conditions. We use the AiShell-1 Chinese speech corpus and the Kaldi ASR toolkit for evaluations. We were able to exceed state-of-the-art ASR performance with SNR lower than 20 dB, demonstrating the feasibility of achieving relatively high performing ASR with open-source toolkits and hundreds of hours of training data, which is commonly available.
Zara, or ‘Zara the Supergirl’ is a virtual robot, that can exhibit empathy while interacting with... more Zara, or ‘Zara the Supergirl’ is a virtual robot, that can exhibit empathy while interacting with an user, with the aid of its built in facial and emotion recognition, sentiment analysis, and speech module. At the end of the 5-10 minute conversation, Zara can give a personality analysis of the user based on all the user utterances. We have also implemented a real-time emotion recognition, using a CNN model that detects emotion from raw audio without feature extraction, and have achieved an average of 65.7% accuracy on six different emotion classes, which is an impressive 4.5% improvement from the conventional feature based SVM classification. Also, we have described a CNN based sentiment analysis module trained using out-of-domain data, that recognizes sentiment from the speech recognition transcript, which has a 74.8 F-measure when tested on human-machine dialogues.
A high-speed train needs high-level maintenance when its accumulated running mileage or time reac... more A high-speed train needs high-level maintenance when its accumulated running mileage or time reaches predefined threshold. The date of delivering an Electric Multiple Unit (EMU) train to maintenance ranges within a time window rather than be a fixed date. Obviously, changing the delivering date always means a different impact on the supply of EMU trains and operation cost. Therefore, the delivering plan has the potential to be optimized. This paper formulates the EMU train high-level maintenance planning problem as a non-linear 0-1 programming model. The model aims at minimizing the mileage loss of all EMU trains with the consideration of the maintenance capacity of the workshop and maintenance ratio at different times. The number of trains under maintenance not only depends on the current maintenance plan, but also influenced by the trains whose maintenance time span from the last planning horizon to current horizon. A state function is established to describe whether a train is un...
Zara, or ‘Zara the Supergirl’, is a virtual robot that can show empathy while interacting with an... more Zara, or ‘Zara the Supergirl’, is a virtual robot that can show empathy while interacting with an user, and at the end of a 5-10 minute conversation, it can give a personality analysis based on the user responses. It can display and share emotions with the aid of its built in sentiment analysis, facial and emotion recognition, and speech module. Being the first of its kind, it has successfully integrated an empathetic system along with the human emotion recognition and sharing, into an augmented humanrobot interaction system. Zara was also displayed at the World Economic Forum held at Dalian in September 2015.
Our team intends to develop a recommendation system for job seekers based on the information of c... more Our team intends to develop a recommendation system for job seekers based on the information of current employees in big companies. Several models are implemented to achieve over 60% success rate in classifying employees, and we use these models to help job seekers identify their best fitting company.
In this paper, we reveal the attenuation mechanism of anchor of the commodity money from the pers... more In this paper, we reveal the attenuation mechanism of anchor of the commodity money from the perspective of logistics warehousing costs, and propose a novel Decayed Commodity Money (DCM) for the store of value across time and space. Considering the logistics cost of commodity warehousing by the third financial institution such as London Metal Exchange, we can award the difference between the original and the residual value of the anchor to the financial institution. This type of currency has the characteristic of self-decaying value over time. Therefore DCM has the advantages of both the commodity money which has the function of preserving wealth and credit currency without the logistics cost. In addition, DCM can also avoid the defects that precious metal money is hoarded by market and credit currency often leads to excessive liquidity. DCM is also different from virtual currency, such as bitcoin, which does not have a corresponding commodity anchor. As a conclusion, DCM can provid...
Sentiment analysis of reviews is a popular task in natural language processing. In this work, the... more Sentiment analysis of reviews is a popular task in natural language processing. In this work, the goal is to predict the score of food reviews on a scale of 1 to 5 with two recurrent neural networks that are carefully tuned. As for baseline, we train a simple RNN for classification. Then we extend the baseline to GRU. In addition, we present two different methods to deal with highly skewed data, which is a common problem for reviews. Models are evaluated using accuracies.
Abstract An essential problem encountered in a railway freight transport system is determining th... more Abstract An essential problem encountered in a railway freight transport system is determining the best formation plan on a capacity-constrained physical network. The formation plan is not only the foundation of railway operations, but also the basis of train scheduling, yard and terminal management, and infrastructure resource planning. An integrated optimization of the traffic routing and train formation plan aims at designing a globally optimal train service network to decide: between which pairs of reclassification yards (terminals) should provide a block, the frequencies of the train services, the physical paths of traffic, and the block sequences of shipments. This study proposes a non-linear binary programming model to address the integrated problem to minimize the total costs of accumulation, reclassification, and travel distances while satisfying various practical requirements. An efficient simulated annealing based heuristic solution approach is developed to solve the mathematical model. We use a penalty function method to tackle the capacity constraints and a customized method for the operational requirements. The feasibility of the solving approach is tested using a numerical case study based on the east-west railway channel of China. The computational results of a national-scale example based on the China railway network consisting of 235 nodes indicates that the proposed model and approach can achieve good quality solutions within an acceptable computing time.
Uploads
Papers by Ruixi Lin