Steps For Data Processing
Steps For Data Processing
Data selection
Data
preproocessin
g
Data
transformation
Data mining
Pattern
evalution
Knowledge
presentaiomn
DATA CLEANING:
• Known as data cleansing.
• It is a phase in which noise data and
irrelevant data are removed from the
collection.
DATA Integration:
In this stage ,multiple data sources often
heterogeneous may be combined in a common
source.
Data Selection: in this step ,the data relevant
too the analysis is decided on ans retrieved
from the collection.
• Pattern Evalautaion:
here strictly interesting patterns representing
knowledge are identified based on given
measures.
• Knowledge Representation: is the final
phase in which the discovered knowledge is
visually represented to the user.
• this essential step uses visualization
techniques to help users understand and
interpret the data mining results.
Architecture of Data Mining System
• Data base ,data ware house, world wide web
and other information repository:
this is one or set of data bases ,data ware
house, spreadsheets or other kind of
repositories. data cleaning and integration
techniques may be performed on the data
• Data or Data Ware House Server:
The data base or data ware house server is
responsible for fetching the relevant data, based on
user’s data mining request.
Knowledge base:
This is the domain knowledge that is used to guide
the search or evaluate the interestingness of
resulting patterns. such knowledge can include
concept hierarchies ,used to organize attributes or
attribute values into different levels of abstraction.
• Data mining Engine:
This is essential to data mining ssystem and
ideally consist of a set of functional modules
for tasks such as characterization , association,
and correlation analysis, classification,
prediction, cluster analysis, outlier analysis and
evolution analysis.
• Pattern evaluation module:
• This component typically employs
interestingness measures and interacts with
the data mining modules so as to focus the
search toward interesting patterns.
• It may use interestingness thresholds to filter
out discovered patterns,
• this can be integrated with mining modules ,
depending on the implementation of the
data mining method used.