Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
…
4 pages
1 file
There are various classifiers available for data classification, selecting the best classifier is one of the critical problems of data classification. Also pre-processing approach to be used is quite important. In this paper, study of various approaches to improve the classification accuracy in data mining is carried out. The purpose of the pre-processing is to gain a high degree of distinct classes before the classifier is trained or tested. Handling noise and outliers is an important aspect in data mining to improve the classification accuracy. High accuracy of classification also depends upon the quality of data being used for classification in data mining. Feature selection is also one of the aspects which can refine the dataset before providing it to the learning algorithm to improve the accuracy of the classifier.
Computers & Operations Research, 2009
In this paper we introduce a method called CL.E.D.M. (CLassification through ELECTRE and Data Mining), that employs aspects of the methodological framework of the ELECTRE I outranking method, and aims at increasing the accuracy of existing data mining classification algorithms. In particular, the method chooses the best decision rules extracted from the training process of the data mining classification algorithms, and then it assigns the classes that correspond to these rules, to the objects that must be classified. Three well known data mining classification algorithms are tested in five different widely used databases to verify the robustness of the proposed method.
2015
The field of data mining and knowledge discovery in databases (KDD) has been growing in leaps and bounds, and has shown great potential for the future[10]. Data classification is an important task in KDD (knowledge discovery in databases) process. It has several potential applications. The performance of a classifier is strongly dependent on the learning algorithm. In this paper, we describe our experiment on data classification considering several classification models. We tabulate the experimental results and present a comparative analysis thereof. Key word- Knowledge discovery in databases, classifier, data classification.
2009
This paper analyzes various problems that appear while performing data mining. The issues of data quality are discussed. The main focus is set on feature selection and its influence on classification results. Feature selection, or discovery of an optimal data set is a process of removing features from the data set that are not useful in decision making, and leaving the most useful ones. The influence of feature selection is analyzed for different classification algorithms. They are applied on two different (in constitution) data sets to solve three problems of medical domain. Presented results show that there is no universal algorithm, whitch could help solving any problem, as well as each data set has its own optimal (sub)set suitable for the classification algorithm. Methodological recommendations to reach possibly optimal solution are given to perform clinical decision support.
ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal
The concept of data mining is to classify and analyze the given data and to examine it clearly understandable and discoverable for the learners and researchers. The different types of classifiers are there exist to classify a data accordingly for the best and accurate results. Taking a primary data, and then classifying it into different portions of parts, then to analyze and remove any ambiguities from it and finally make it possible for understanding. With this process, that data will become secondary from primary and will called information. So, the classifiers are doing the same strategy for the solution and accuracy of the data. In this paper, different data mining approaches have been used by applying different classifiers on the taken data set. The data-set consists of 500 candidates’ segregated data for the analysis and evaluation to perfectly classify and to show the accurate results by using the proposed Algorithms. The data mining approaches have been used in which HUGO (...
Data mining is now one of the most active field of research. Extracting those nuggets of information is becoming crucial and one of its important technique is classification. It helps to group the data in some predefined classes. Various techniques for classification exists which classifies the data using different algorithms. Each algorithm has its own area of best and worst performance. This paper concentrates on the four most famous algorithms, i.e., Decision Tree, Naïve Bayes, K Nearest Neighbour and Genetic Programming and the effect on their performance of time and accuracy when the number of instances are incrementally decreased. This paper will also investigate the difference in result when working with binary class or multiclass datasets and suggest the algorithms to follow when using certain kind of dataset.
The Data Mining refers to extracting or mining knowledge, information from the large amount of Data. The main purpose of data mining is data analysis. In Data Mining various techniques that used are Association Rule Mining, Sequential Pattern Mining, Clustering, and Classification. Classification is a data mining technique used to predict the class label or membership data. In this paper, we present the basic classification techniques. Several major kinds of technique, including Decision trees (DTs), Naive Bayes, K-nearest neighbor (K-NN), Artificial Neural Networks (ANN). The main goal of this survey is to provide Comparative review of various classification techniques in data mining.
Nowadays the use of computer technology in the field of medical diagnosis and prediction of disease has increased. In these fields the computers are used with intelligence such as fuzzy logic, artificial neural network and genetic algorithms. Many techniques of data mining are useful in the field of medicine and many algorithms have been developed. The main objective of this work is to find out the important attributes which are highly important for accuracy of the classifier and reduce the dimensionality of dataset for classification of disease dataset. The other objective of this work is to classify the dataset in cost effective manner. As many tests are redundant and also are highly expensive. We have used various approaches for feature selection as using Brute force approach and correlation based approach. We have also proved that accuracy of classifiers are improved using feature selection.
International Journal of Engineering Sciences & Research Technology, 2013
Data mining is a part of knowledge discovery process and information industry due to the vast availability of large amounts of data. The data mining is one comprehensive application of technology item relying on the statistical analysis, artificial intelligence and it has shown great commercial value and gradually to other profession penetration in the retail, insurance telecommunication, power industries use. Data mining technique usually fall into two categories Predictive and Descriptive. Predictive mining predict the trends and properties of unknown data based on the known data. Descriptive mining describes concepts or task relevant data sets in concise, summarative, informative and discriminative forms The objective of this paper is to analyze various Classification algorithms in Data mining. The Classification algorithm includes KNN, Decision tree, Naïve Bayes and Neural Network. The algorithms performances are analyzed with various dimensions.
Journal of Information Engineering and Applications, 2019
Text mining is a special case of data mining which explore unstructured or semi-structured text documents, to establish valuable patterns and rules that indicate trends and significant features about specific topics. Text mining has been in pattern recognition, predictive studies, sentiment analysis and statistical theories in many areas of research, medicine, financial analysis, social life analysis, and business intelligence. Text mining uses concept of natural language processing and machine learning. Machine learning algorithms have been used and reported to give great results, but their performance of machine learning algorithms is affected by factors such as dataset domain, number of classes, length of the corpus, and feature selection techniques used. Redundant attribute affects the performance of the classification algorithm, but this can be reduced by using different feature selection techniques and dimensionality reduction techniques. Feature selection is a data preprocessing step that chooses a subset of input variable while eliminating features with little or no predictive information. Feature selection techniques are Information gain, Term Frequency, Term Frequency-Inverse document frequency, Mutual Information, and Chi-Square, which can use a filters, wrappers, or embedded approaches. To get the most value from machine learning, pairing the best algorithms with the right tools and processes is necessary. Little research has been done on the effect of feature selection techniques on classification accuracy for pairing of these algorithms with the best feature selection techniques for optimal results. In this research, a text classification experiment was conducted using incident management dataset, where incidents were classified into their resolver groups. Support vector machine (SVM), K-Nearest Neighbors (KNN), Naïve Bayes (NB) and Decision tree (DT) machine learning algorithms were examined. Filtering approach was used on the feature selection techniques, with different ranking indices applied for optimal feature set and classification accuracy results analyzed. The classification accuracy results obtained using TF were, 88% for SVM, 70% for NB, 79% for Decision tree, and KNN had 55%, while Boolean registered 90%, 83%, 82% and 75%, for SVM, NB, DT, and KNN respectively. TF-IDF, had 91%, 83%, 76%, and 56% for SVM, NB, DT, and KNN respectively. The results showed that algorithm performance is affected by feature selection technique applied. SVM performed best, followed by DT, KNN and finally NB. In conclusion, presence of noisy data leads to poor learning performance and increases the computational time. The classifiers performed differently depending on the feature selection technique applied. For optimal results, the classifier that performed best together with the feature selection technique with the best feature subset should be applied for all types of data for accurate classification performance.
Security Dialogue, 2023
Bowdoin Journal of Cinema, 2024
International Journal of Student Voice, 2018
Computational and Mathematical Methods in Medicine, 2018
Revista Movimiento Cientifico, 2011
Ortopedia Traumatologia Rehabilitacja, 2013
The Astronomical Journal, 2021
Bragantia, 2008
Frontiers in psychology, 2017
Conference: Open University Year 1 Professional Doctorate Presentations, Milton Keynes, 2023
American Journal of Hypertension, 2005
Cartas Diferentes: Revista Canaria de Patrimonio Documental, 2018