Thuy Nguyen

Followers

Following

Public Views

Interests

Uploads

Papers by Thuy Nguyen

Classification optimization for training a large dataset with Naïve Bayes

Journal of Combinatorial Optimization, 2020

Book classification is very popular in digital libraries. Book rating prediction is crucial to im... more Book classification is very popular in digital libraries. Book rating prediction is crucial to improve the care of readers. The commonly used techniques are decision tree, Naïve Bayes (NB), neural networks, etc. Moreover, mining book data depends on feature selection, data pre-processing, and data preparation. This paper proposes the solutions of knowledge representation optimization as well as feature selection to enhance book classification and point out appropriate classification algorithms. Several experiments have been conducted and it has been found that NB could provide best prediction results. The accuracy and performance of NB can be improved and outperform other classification algorithms by applying appropriate strategies of feature selections, data type selection as well as data transformation. Keywords Data mining • Naïve Bayes • Word embedding • Feature selection Method Accuracy Case (3.1.1) Case (3.1.2) Case (3.1.3) Case (3.1.4) NB 71.3028 72.1206 72.1238 72.0601 The accuracy of NB in Case (3.1.3) is highest 4.3 Case 3: Attribute selection strategies are applied In Case 3, we conduct several strategies of attribute selection to optimize the NB model and evaluate the correlation of attributes with the attribute class in making book-rating predictions. First, we assess if the transformed attributes are useful, compared with the corresponding original given attributes. Second, several attribute selection methods are conducted and compared on the dataset in nominal or numeric attribute types. In particular, three sub cases are carried out, as follows: Case 3.1: assess the worth of each transformed attribute in the dataset; Case 3.2: assess attribute selection strategies for NB-based classification; and Case 3.3: same as Case 3.2, but the type of all attributes is set to nominal. 4.3.1 Case 3.1: Assess the worth of each transformed attribute in the dataset Some experiments of replacing each of attribute value set with the transformed attribute value set, respectively, have been conducted to assess if they impact on NB-based classification. For example, the ISBN attribute of the dataset in Case 1 is replaced with the titleVecSumLen attribute, and so on. Experimental cases are designed as follows. Case (3.1.1): same as Case 1, but "ISBN" is replaced with "titleDotPro". Case (3.1.2): same as Case 1, but "ISBN" is replaced with "titleVecSumLen". Case (3.1.3): same as Case 1, but "ISBN" is replaced with "titleVecMean". Case (3.1.4): same as Case 1, but "ISBN" is replaced with "Review_Vec". Compare with the result of NB in Case 1, the accuracy is 71.3135, selecting "ti-tleVecMean" instead of "ISBN" can improve the accuracy of NB (Table 4, Case 3.1.3). We continue conducting the next experiments, as follows. Case (3.1.5): same as Case (3.1.3), but "location" is replaced with "latitude" and "longitude". Case (3.1.6): same as Case (3.1.3), but "location" is replaced with "latitude". Case (3.1.7): same as Case (3.1.3), but "location" is replaced with "longitude". Compare with the result of NB in Case (3.1.3), replacing "location" with "latitude" is able to improve the accuracy of NB-base classification (Table 5). General speaking, the transformed data could improve the classification performance, especially when using "titleVecMean" and "latitude".

Download

Efficient Boosting-Based Active Learning For Specific Object Detection Problems

In this work, we present a novel active learning approach for learning a visual object detection ... more In this work, we present a novel active learning approach for learning a visual object detection system. Our system is composed of an active learning mechanism as wrapper around a sub-algorithm which implement an online boosting-based learning object detector. In the core is a combination of a bootstrap procedure and a semi automatic learning process based on the online boosting procedure. The idea is to exploit the availability of classifier during learning to automatically label training samples and increasingly improves the classifier. This addresses the issue of reducing labeling effort meanwhile obtain better performance. In addition, we propose a verification process for further improvement of the classifier. The idea is to allow re-update on seen data during learning for stabilizing the detector. The main contribution of this empirical study is a demonstration that active learning based on an online boosting approach trained in this manner can achieve results comparable or ev...

Download

Classification optimization for training a large dataset with Naïve Bayes

Journal of Combinatorial Optimization, 2020

Download

Efficient Boosting-Based Active Learning For Specific Object Detection Problems

Download

Thuy Nguyen

Uploads

Papers by Thuy Nguyen

Log In