Three Dimensional Model For Diagnostic Prediction: A Data Mining Approach
Three Dimensional Model For Diagnostic Prediction: A Data Mining Approach
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 4, July August 2013 ISSN 2278-6856
carried out. It is not feasible to study all the diseases, related symptoms and the diagnostic. There are number of health care applications where the mining process has provided the automated solution. Though such kind of applications always suffer from the problem of lack of standardization. Such as, in the case of hypertension, different experts may conceive it differently. The effect of a medicine or the treatment may vary from patient to patient. There are number of such issues which must be considered, while performing mining process in the medical area. To resolve these issues, such kind of applications must be developed under some experts observation. In this present work, we are defining the model of an application which is being used to predict the patients disease. In section 3 of this paper, the model and its detailed description is given. Before understanding such model, some key terms, used in this work, are defined as under. 1.1 Classification of Dataset The medical dataset always consists of large number of attributes and their related data, and it is not feasible and efficient to work on entire dataset every time. Because of this reason, some classification approach is required to reduce the size of raw dataset. In this proposed work, different classes will be generated, based upon patients age or gender. For this class generation, initial step is to identify the classes such as in case of age attribute, the possible classes can be kid, young, middle-age, old, very old etc. Next step is to assign different age groups to these classes and finally, based on this class generation assumption; we will elect the particular class on which the complete work will be carried out. The classes are principally required to reduce the size of an actual dataset on which the work will be performed. 1.2 Associativity between Data-Items Another important concept of data mining is to identify the relationship among different data fields. To enhance the performance, in this work, association is identified at two levels; firstly, between the patients symptoms and the relative disease and second is between the disease and associated diagnostics. To perform such kind of analysis, the support vector will be identified. The data fields that are highly associated will be retained in the dataset, for Page 379
1. INTRODUCTION
One of the major approaches provided by data mining is the prediction algorithm. The prediction system is about to observe the statistical and historical data and to identify the chances of occurrence of some event in near future. Health care applications are one of the major research areas which come into the data mining prediction system. It is involved in health care applications in many ways. The reliability of such system depends on two major factors. One is the reliable dataset and other is the expert's suggestions and involvement. Dataset required for such system includes the patient data, symptoms and the relative treatment dataset. These kinds of datasets are mainly maintained by the organizations themselves. The trends, patterns identified from these dataset gives a positive impact to make a reliable decision on such datasets. In this work, we are presenting a prediction model to identify the patients disease. The present work is a parametric approach to perform such kind of decision making. However, as we know a single indicator or factor cannot decide the disease or the diagnostic. For the enhanced results, we need an outsized dataset with large number of possible attributes which can represent a patient completely along with his behavior and physical characteristics. Another factor which we have to decide is about the disease for which the whole work will be Volume 2, Issue 4 July August 2013
2. LITERATURE SURVEY
For efficient and time effective analysis, feature subset selection is very important. Asha Gowda & M.A. Jayaram [1] in their work developed a model for Indian diabetic databases, with two stages. Depicted by the results, this model filters the given input data incredibly, to reduce the size, for further analysis by using Back Propagation neural networks. Their work proved the enhancement of classification accuracy by feature subset selection. Another big contribution in the shape of methodology for cluster validation, by Linda J. Moniz, Brian H. Feighner and Joseph S. Lombaro[2]. They described a narrative approach for generating Electronic Medical Records, focusing on data mining steps. They verified the approach by taking huge size, varied data set. In the year 2011, Hnin [3] performed a work on disease prediction based on dataset fragmentation. He worked on a dataset of heart disease and performed the k-means clustering on the dataset to generate the fragments. After that sequential pattern was analyzed to predict the disease. The decision tree approach was used to present the result. An association mining approach was used on medical dataset by Wei Wang in the year 2010[4] to generate some rules to decide the disease support with symptoms. He considered the case of heart disease and generated a class hierarchy to extract the rules based on association. In the year 2011, Chen-Guang Zhao [5] works on a rough medical dataset and filter it to improve the fault tolerance. The proposed system had defined some decision support rules to filter the dataset to generate a reduce dataset. On this data set, decision rule is applied to predict the disease. An ACO (Ant Colony Optimization) influenced fuzzy approach was used by Mostafa Fathi Ganji[6] in 2010 to classify the medical dataset. On the available dataset, fuzzy rule set was generated and based on rule set a ACO based classification approach was implemented. Li, Fu and Williams in their work discussed the problem of finding risk patterns in medical dataset. They presented an efficient algorithm, based on anti monotone property, to mine optimal risk pattern sets. They also Volume 2, Issue 4 July August 2013
3. RESEARCH METHODOLOGY
In this work, we are aggregating the strengths of three different data mining aspects, to predict the patient disease as well as the diagnostic. One aspect is the statistical analysis. For this analysis, we need a large amount of data respective of large attribute set. The statistical analysis is one of the traditional approaches used by the mining to conclude some result from the
iii)
iv)
v)
vi)
Figure 1: Three Dimensions of Proposed Work given dataset. This work will include the improved association mining along with the fuzzy concept. The association mining will perform two important tasks; first task is to correlate the related attributes and another is to prune the unwanted data as well as attributes from the data set. Another aspect is the use of a cubic model in terms of time series. As we know, we have large dataset in medical applications. Here, we are using the concept of deductive database in which dataset will itself be reduced according to the user query. Most of the disease symptoms give different criteria for male and female datasets. First level of classification can be based on gender. This classification can be followed by patient age as the different age group requires different treatment. For this categorization, we need to perform the data clustering. Number of clusters will depend upon the number of age groups as well as gender. The final aspect is based on the prediction approach. In this work, the Markov model is suggested to take the decision about the disease and the diagnostic. Algorithm(P.,D[],S[][],Di[]) /* Here P is the information taken as input from the patient; it includes the basic details as well as symptoms of the disease. D is the database of possible Disease, S is the dataset for the Symptom and Di is the dataset for related Diagnostics*/ { i) Accept the Patient Age, Gender and Symptoms as the raw input. ii) Divide the Symptoms Database according to Volume 2, Issue 4 July August 2013
vii)
viii)
ix)
x) }
Figure 2 Proposed Architecture In Figure 2, the Gray boxes represent the process, and white boxes represent the input and output of each process. As we can visualize that with each step, the size of the database is being reduced as some deduction on database is performed to increase the process efficiency. As the process is performed in the sequence, the final results will be obtained. Page 381
4. CONCLUSION
The proposed system is the architecture based system where three data mining approaches are collected in a sequence to predict the patients disease as well as the diagnostic. In this system, we started with rough dataset (larger in size) and reduction of data is carried out at each level to perform the work efficiently. On initial stage, the data classification and clustering techniques are used to predict the disease and later on, the association mining along with Markov model is suggested to predict the diagnostic. The proposed model is a generic model which can work on any raw medical dataset.
REFERENCES
[1] Asha Gowda Karegowda M.A.Jayaram, Cascading GA & CFS for Feature Subset selection in Medical Data Mining, 2009 IEEE International Advance Computing Conference (IACC 2009) [2] Linda J. Moniz, Brian H. Feighner, and Joseph S. Lombardo, Mining Electronic Medical Records for Patient Care Patterns, 978-1-4244-27659/09/$25.00 2009 IEEE [3] Hnin Wint Khaing, Data Mining based Fragmentation and Prediction of Medical Data, 9781-61284-840-2/11/$26.00 2011 IEEE [4] Wei Wang, Yaohua Wu, Mining Association rules in Medical Data Based on Concept Lattice, Proceedings of the 8th World Congress on Intelligent Control and Automation July 6-9 2010, Jinan, China [5] Chen-Guang Zhao WenGe, Research of Electronic Medical Record Data Mining Method Based on Variable Precision Rough Set, 2011 International Conference on Electronics and Optoelectronics (ICEOE 2011) [6] Mostafa Fathi Ganji, and Mohammad Saniee Abadeh, Parallel Fuzzy Rule Learning Using an ACO-Based Algorithm for Medical Data Mining, 978-1-4244-6439-5/10/$26.00 2010 IEEE [7] Jiuyong Li, Ada Wai-Chee Fu, Hongxing He, Jie Chen, Mining Risk Patterns in Medical Data, KDD05, August 2124, 2005, Chicago, Illinois, USA. Copyright 2005 ACM 159593135X/ 05/0008 ...$5.00. [8] Yen-Ting Kuo, Andrew Lonie, Liz Sonenberg, Domain Ontology Driven Data Mining, 2007 ACM SIGKDD Workshop on Domain Driven Data Mining Page 382
Page 383